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TRACT 


The  problem  of  estimating  a  rigid  body  motion  from  two  noisy  images  of 
an  object  taken  at  two  different  times  is  studied. 

The  available  data  consist  of  the  unordered  locations  of  some  of  the 
prominent  points  of  the  object.  Because  these  points  are  not  individually 
recognized  and  some  of  them  may  be  missed  in  one  or  l>oth  of  the  images ,  it 
is  not  obvious  which  points  of  the  two  images  correspond  to  one  another. 
Moreover  the  observed  locations  are  subject  to  error. 

A  computational  procedure,  capitalizing  on  the  rigidity  of  the  object, 
is  proposed  for  estimating  the  motion  parameters  of  the  object  in  the 


presence  of  Gaussian  noise  (K(o, ozI^independently  added  to  the  positions  in 


the  images  of  the  points  observed. 

In  principle,  one  might  apply  maximum  likelihood  to  the  estimation 
problem,  but  the  difficulty  in  formulating  and  calculating  the  likelihood 
function  under  the  above  mentioned  assumptions  if  formidable.  ,  However  one 
ought  to  expect  to  do  better  when  the  observed  points  are  recpfpilzed  and  the 
common  points  among  them  are  matched  without  error  in  identification.  Hence 
this  latter  situation,  which  is  readily  treated^-by  maximum  likelihood, 
should  provide  us  with  lower  bounds  J^xr— the^error  in  our  estimates  for  the 
original  problem. -  _ _ — ' 

Asymptotic' normality  and  consistency  of  the  maximum  likelihood  estimate 
as  o  -» ^  are  derived  for  this  favorable  situation.  Similar  results  hold 
even  wen  o  is  incorrectly  assumed  to  be  aa  -  ko  for  positive  k  #  1. 

A  simulation  study  has  been  carried  out  to  compare  the  efficiency  of 
the  proposed  estimator  for  the  more  complex  problem  with  that  of  the  maximum 
likelihood  estimate  in  the  favorable  situation  and  to  test  the  robustness  of 
the  proposed  estimator  under  the  misspecif ication  of  the  value  of  (ff)— 
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ABSTRACT 

The' problem  of  estimating  a  rigid  body  motion  from  two  noisy  images  of  an 
object  taken  at  two  different  times  is  studied. 

The  available  data  consist  of  the  unordered  locations  of  some  of  the  prominent 
points  of  the  object.  Because  these  points  are  not  individually  recognized  and  some 
of  them  may  be  missed  in  one  or  both  of  the  images,  it  is  not  obvious  which  points 
of  the  two  images  correspond  to  one  another.  Moreover  the  observed  locations  are 
subject  to  error. 

A  computational  procedure,  capitalizing  on  the  rigidity  of  the  object,  is  proposed 
for  estimating  the  motion  parameters  of  the  object  in  the  presence  of  Gaussian 
noise  >/(0,  cr2  Ij)  independently  added  to  the  positions  in  the  images  of  the  points 
observed. 

In  principle,  one  might  apply  maximum  likelihood  to  the  estimation  problem, 
but  the  difficulty  in  formulating  and  calculating  the  likelihood  function  under  the 
above  mentioned  assumptions  is  formidable.  However  one  ought  to  expect  to  do 
better  when  the  observed  points  are  recognized  and  the  common  points  among  them 
are  matched  without  error  in  identification.  Hence  this  latter  situation,  which  is 
readily  treated  by  maximum  likelihood,  should  provide  us  with  lower  bounds  for 
the  error  in  our  estimates  for  the  original  problem. 

Asymptotic  normality  and  consistency  -of  the  maximum  likelihood  estimate  as 
a  — ►  0  are  derived  for  this  favorable  situation.  Similar  results  hold  even  when  a  is 
incorrectly  assumed  to  be  oa  =  ka  for  positive  k  ^  1. 

A  simulation  study  has  been  carried  out  to  compare  the  efficiency  of  the  pro¬ 
posed  estimator  for  the  more  complex  problem  with  that  of  the  maximum  likelihood 
estimate  in  the  favorable  situation  and  to  test  the  robustness  of  the  proposed  esti¬ 
mator  under  the  misspecification  of  the  value  of  a. 
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1.1  Introduction  and  Summary 

The  problem  of  estimating  a  rigid  body  motion  from  two  consecutive  image  frames 
of  an  object  taken  at  two  different  times  is  an  important  issue  in  image  sequence 
analysis.  For  example,  motion  estimation  cannot  be  separated  from  the  problem  of 
image  matching  and  image  registration,  which  in  turn,  have  a  wide  range  of  direct 
applications.  Consequently,  there  is  a  sizable  amount  of  literature  dealing  with  the 
subject,  especially  for  three  dimensional  rigid  body  motion. 

A  large  number  of  the  existing  methods  for  the  three-dimensional  problem  ba¬ 
sically  depend  on  solving  simultaneous  non-linear  equations  that  one  gets  from  the 
known  matched  points  in  the  two  image  frames.  In  the  event  of  noisy  images, 
least  squares  methods  are  applied  by  simply  requiring  a  greater  number  of  matched 
points  than  that  of  the  parameters  to  be  estimated.  However,  they  all  produce 
rather  unsatisfactory  results  when  the  angle  of  rotation  of  the  motion  is  not  small, 
or  when  they  are  applied  to  real  images  (*.e.,  noisy  images). 

In  this  thesis,  an  estimation  method  for  two  dimensional  motion,  capitalizing  on 
the  rigidity  of  the  object  involved  with  the  assumption  that  the  images  are  noisy, 
is  proposed.  It  is  assumed  that  the  data  consist  of  the  unordered  locations  of  some 
of  the  prominent  points  of  the  object.  Because  it  is  assumed  here  that  these  points 
are  not  individually  recognized  and  some  of  them  may  be  missed  in  one  or  both  of 
the  image  frames,  it  is  not  obvious  which  observed  points  in  the  two  image  frames 
correspond  to  one  another.  Moreover  it  is  also  assumed  that  >/(0,  cr2!^)  Gaussian 
noise  is  independently  added  to  the  locations  of  the  points  observed  in  the  two  image 
frames.  Methods  for  this  two  dimensional  problem  should  prove  to  be  helpful  in 
attacking  the  more  practical  three  dimensional  problem  where  some  points  may 
become  occluded. 

In  Chapter  2,  a  survey  of  the  existing  literature  on  motion  estimation  and  the 


' 

i 

3 


related  matching  problems  is  presented.  In  Chapter  3,  the  details  of  the  proposed 
estimation  procedure  and  the  logic  behind  it  are  given.  Then,  in  Chapter  4,  the 
maximum  likelihood  estimate  in  a  more  favorable  situation,  where  the  observed 
points  are  recognized  and  the  common  points  among  them  are  matched  without 
error  in  identification,  is  studied.  Asymptotic  normality  and  consistency  of  the 
maximum  likelihood  estimate  as  a  — ►  0  for  this  latter  situation  are  derived.  Also 
similar  results  are  derived  even  when  a  is  incorrectly  assumed  to  be  cra  =  kcr  for 
positive  k  ±  1.  These  provide  us  with  lower  bounds  for  the  error  in  our  estimates  for 
the  original  problem.  Chapter  5  shows  some  simulation  results.  Finally  Chapter  6 
has  conclusions  and  suggestions  for  further  work. 

In  the  next  section,  we  list  some  definitions  and  notation  used  throughout. 

1.2  Definitions  and  Notation 

Let  X  be  an  m-dimensional  real  vector  valued  random  variable.  We  denote  the  law 
( distribution )  of  X  by  £[X].  Sometimes  we  use  the  notation  £[X|/9]  instead  of  £[X] 
to  emphasize  the  fact  that  the  distribution  of  X  is  dependent  on  the  parameter  p. 
We  write  E[X],  and  Cov[X]  to  denote  the  mean  vector,  and  covariance  matrix  of  X 
respectively.  In  case  X  is  a  1-dimensional  real  random  variable,  Cov[X]  is  reduced 
to  the  variance  Var[X]. 

A  Gaussian  (normal)  distribution  with  mean  vector  fx  and  covariance  matrix  S 
is  denoted  by  M(n,  E). 

We  interpret  any  vector  to  be  a  column  vector  when  it  is  referred  to  only  by  its 
name  in  equations  or  formulae  unless  otherwise  specified. 

We  denote  by  Rm,  the  m-dimensional  Euclidean  real  vector  space,  and  use 
(— 7r,7r]  to  denote  the  set  {u  €  Rl  |  —n  <  u  <  tt}. 

The  notation  is  to  be  read  “by  definition  is  equal  to”. 


The  m  by  m  identity  matrix  is  denoted  by 


1  0  •••  0 

°  1  X  1  -  (1-1) 
:  ■*.  *•.  0 
0  ...  0  1  . 

We  denote  by  lm,  the  m-dimesional  vector  with  m  l’s  as  its  components: 


For  any  matrix  A,  we  use  AT  to  denote  the  transpose  of  A. 

For  any  pj  by  q2  matrix  A  and  p\  by  q2  matrix  B,  we  define  the  direct  product 
or  Krontckcr  product  of  A  and  B,  denoted  by  A  ®  B,  to  be  the  pip?  by  q2q2  matrix 
C  where 

A&n  A612  ...  A6i„ 

A&2x  A62j  •  •  •  A6j9i 

•  •  • 

•  •  • 

•  •  • 

A6Pli  A6ia  •••  AfrPlfl 
where  is  the  (*,  j)-th  element  of  B,  for  i  =  1, . . .  ,p\,  j  =  1, . . . ,  q2. 

For  any  vector  x  :=  (xx,...,xm)T  G  Rm,  we  use  j|x||  to  denote  the  Euclidean 


(1.4) 

(1.5) 


The  covering  map  on  S1,  c*  :  S1  — ►  (—*■,*•],  assigns  to  each  x  €  S  ,  a  c*(z)  G 
(— ir,jr]  where 

(coec*(x),sine*(x))  =  x.  (1.7) 

We  will  also  define  the  covering  map  on  R1,  c  :  R1  — ►  (— jt,  jt],  by 

c(x)  :=  e*(cosx,sinx),  (1.8) 

for  each  x  G  R1.  Note  that  for  any  x  G  Rl,  we  have 

c(x)  =  x  (mod  2ir).  (1.9) 

The  function  arctan  :  R1  \  {0}  — ►  ( — w,  w]  is  defined  by 


arctan(x,y)  :=  c*u(x,y) 


(1.10) 


for  each  (x,y)  G  R*  \  {0}.  We  call  arctan(x,y),  the  angle  of  the  vector  (x,y).  Note 
that  for  any  (x,y)  G  R*  \  {0},  we  have  that 


tan(arctan(x,  y))  =  — 
x 


(1.11) 


arctan(x,y)  ^  arctan(-x, -y). 


(1.12) 


For  any  two  vectors  t>i,  t>j  G  RJ,  the  difference  vector  of  Vj  and  v2  denoted  by 
D(vi,»2)  is  defined  by 

D(vu  vi):=v2-vl.  (1-13) 

For  a  function  t :  Rm  — ►  Rd,  where 


1 


we  define  the  Jacobian  matrix  or  derivative  of  t  with  respect  to  v,  when  it  exists,  to 
be  the  d  by  m  matrix  dt(v)/dv  whose  (t,j)-th  element  is  given  by 


dt(v) 


dv 


dti(vi,...,vm) 


for  t ’  =  1, . . . ,  d,  j  =  1, ,  m, 


(1.15) 


dvi 

where  v  =  (t>i, . . . ,  vm). 

The  2  by  2  orthogonal  matrix  representing  the  rotation  by  0  radians  about  the 
origin  in  a  plane  is  denoted  by  U(0): 


U(«) 


cos  9  —  sin  9 
sin  9  cos  0 


(1.16) 


We  use  T  to  denote  the  vector  of  translation  by  Tzt  Tv  in  the  x-  and  y-coordinates 
respectively, 


(1.17) 


A  rigid  body  motion  of  an  object  in  a  plane  is  a  transformation  of  the  object  that 
can  be  obtained  through  a  rotation  and  a  translation  only.  Thus  a  general  rigid 
body  motion  can  be  uniquely  characterized  in  terms  of  a  rotation  about  the  origin 
.followed  by  a  translation  in  the  plane.  In  other  words,  a  rigid  body  motion  is  a  map 


(x,y)  i — »  (z\y') 


for  any  point  (x,  y)  on  the  object,  where 

+  T  (1.18) 

for  some  uniquely  defined  U(0)  and  T. 

An  image  frame  is  a  two-dimensional  plane  that  contains  an  image  of  the  object 
in  which  we  are  interested,  where  some  globally  fixed  (for  different  image  frames) 
Euclidean  coordinates,  which  we  call  the  image  frame  coordinates ,  are  superim¬ 
posed. 


A  location  ( position )  of  a  point  in  an  image  frame  is  the  image  frame  coordinate 
vector  of  the  point. 

We  will  treat  an  object  as  represented  by  a  finite  set  of  points  on  it,  called 
prominent  points.  In  the  simulations  in  Chapter  5,  we  will  assume  that  there  are 
10  prominent  points.  A  prominent  point  is  said  to  be  observed  if  we  see  the  point  in 
an  image  frame  but  cannot  necessarily  distinguish  which  of  the  several  prominent 
points  it  is.  Also  a  prominent  point  is  said  to  be  recognized  if  we  see  the  point  and 
can  identify  which  of  the  several  prominent  points  it  is.  The  noisy  locations  of  the 


observed  points  on  the  object  form  the  data  in  our  problem. 
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There  are  two  separate  approaches  in  motion  estimation  problems.  One  of  them, 
the  so  called  “low  level "  or  “ signal  processing”  approach,  is  the  one  where  you  have 
grey  level  images  as  your  data  and  you  try  to  estimate  the  motion  by  operating  on 
the  raw  data.  One  of  the  methods  in  that  approach  is  the  Fourier  method  for  the 
two-dimensional  motion  problem,  where  you  take  advantage  of  the  fact  that  sharp 
straight  edges  give  rise  to  line  spectra  in  the  frequency  domain,  and  hence  allow 
you  to  estimate  the  angle  of  rotation.  Another  low  level  method  is  the  matching  or 
correlation  method  that  sets  up  a  cost  function  depending  on  the  grey  level  functions 
of  pixels  in  each  image  frame  and  tries  to  find  the  value  of  the  motion  parameters 
which  minimizes  the  cost  function.  There  is  another  method  called  the  method  of 
differentials  in  the  low  level  approach  which  uses  the  idea  of  relating  time  differences 
to  spatial  differences.  All  of  the  three  methods  mentioned  above  are  described  in 
[Huang  81]. 

The  other  approach  is  the  “high  level "  or  “feature  based ”  approach,  where  we 
assume  that  some  features,  e.g.,  points,  lines,  contours,  etc.,  in  the  images  have 
already  been  extracted  by  some  other  means  and  the  data  consist  of  the  locations 
of  the  features.  Normally  it  is  assumed  that  the  correspondence  between  the  features 
has  already  been  established  in  the  feature  extraction  process.  In  this  sense,  the 
estimation  method  to  be  proposed  in  this  thesis  is  somewhere  between  the  two 
approaches  but  closer  to  the  feature  based  approach  because  we  have  as  data  the 
unordered  observed  locations  of  some  of  the  prominent  points  in  two  image  frames 
but  they  are  not  recognized  and  may  be  missing  in  one  of  the  two  image  frames. 

The  high  level  methods  may  then  be  further  classified  into  two  classes,  namely 
the  “ equation  solvers "  and  the  “merit  score  or  weight  maximizers” .  The  equation 
solvers  basically  assume  that  the  correspondence  between  features  in  each  image 
frame  are  known  and  hence  derive  sets  of  simultaneous  equations,  usually  non-linear, 
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with  motion  parameters  as  unknowns.  Researchers  in  this  line  of  approach  include 
Ullman  [Ullman  79],  Roach  and  Aggarwal  [Roach  79, Roach  80],  Nagel  [Nagel  81a], 
Nagel  and  Neumann  [Nagel  81b],  Tsai  and  Huang  [Tsai  81, Tsai  84],  Tsai,  Huang 
and  Zhu  [Tsai  82],  Fang  and  Huang  [Fang  83a, Fang  83b].  Methods  for  solving  such 
equations  are  generally  iterative  and  require  good  initial  guesses  of  unknowns.  How¬ 
ever,  sensitivity  to  noise  is  shown  in  the  experiments  reported  by  Fang  and  Huang 
[Fang  83a, Fang  83b]. 

The  merit  score  maximizers  often  bear  the  names  of  “ point  pattern  matching ” 
or  “image  matching"  instead  of  motion  estimation.  They  generally  deal  with  two- 
dimensional  motion  only.  Even  though  they  usually  assume  that  the  first  image 
frame  contains  the  prototype  or  model  pattern  with  which  the  observed  point  pat¬ 
tern  in  the  second  is  to  be  matched,  they  essentially  attack  the  same  estimation 
problem  as  mentioned  in  Section  1.1.  In  these  settings,  as  the  name  “matching” 
says,  the  individual  correspondence  between  features  in  different  image  frames  are 
not  assumed.  They  implicitly  or  explicitly  assign  merit  scores  or  weights  to  possi¬ 
ble  matches  of  features  according  to  their  compatibility  with  the  candidate  value 
of  motion  parameters,  and  then  try  to  maximize  the  total  merit  scores  to  get  the 
maximizing  value  of  motion  parameters.  Our  method  to  be  proposed  in  this  thesis 
belongs  to  this  class  but  is  not  concerned  with  explicit  matching.  We  list  some  of 
the  existing  methods  that  fall  into  this  class  below. 

Simon,  Checroun  and  Roche  [Simon  72]  compute  all  interpoint  distances  in  each 
point  pattern  and  then  use  a  comparison  of  the  sorted  list  of  these  with  some 
relaxation  rule  to  match  the  point  patterns.  Their  method  is  applicable  only  to 
patterns  that  contain  equal  numbers  of  points,  i.e.,  there  are  no  missing  prominent 
points. 

Seidl  [Seidl  74]  measures  similarity  of  point  patterns,  without  seeking  an  ex- 


plicit  matching,  using  nearest  neighbor  relations.  Again  the  method  is  restricted  to 
patterns  that  contain  equal  numbers  of  points  in  application. 

Zahn  [Zahn  74]  compares  the  minimal  spanning  trees  of  point  patterns,  but  this 
method  is  also  sensitive  to  missing  and  extra  points. 

Kahl,  Rosenfeld  and  Danker  [Kahl  80]  only  consider  small  (<  10°)  angles  of 
rotation.  Even  though  their  method  is  sensitive  to  noise  exceeding  a  preset  level,  it 
is  insensitive  to  missing  or  extra  points. 

Ranade  and  Rosenfeld  [Ranade  80]  consider  the  situation  where  only  translation 
is  allowed.  Each  individual  point  matching  is  rated  by  its  effect  on  other  points, 
and  through  relaxation,  an  overall  matching  is  converged  to.  Experimental  results 
are  given  showing  a  tolerance  to  some  noise. 

Lavine,  Lambird  and  Kanal  [Lavine  81]  try  to  recognize  point  patterns  without 
finding  an  explicit  matching  using  sequences  of  interpoint  distances  and  show  the 
“correctness”  of  their  method  under  a  plausible  noise  model. 

Bolles  [Bolles  79]  and  Ogawa  [Ogawa  86]  apply  maximal  clique  techniques  to  the 
point  pattern  matching  problems,  and  the  latter  uses  the  Delaunay  triangulation 
to  partition  a  point  pattern  into  a  set  of  triangles  reducing  the  computational  cost 
of  matching. 

Baird  [Baird  84]  proposes  a  method  for  the  situation  where  there  are  no  miss¬ 
ing  or  extra  points,  allowing  noise  whose  bounds  are  specified  as  arbitrary  convex 
polygons  about  each  model  point  location.  He  also  gives  the  comparisons  of  various 
methods  in  terms  of  the  asymptotic  order  of  computational  runtime  as  the  number 
of  feature  points  increases. 

In  summary,  only  a  few  researchers  have  provided  methods  to  attack  the  motion 
estimation  problem  which  allows  an  arbitrary  rigid  body  motion,  when  there  are 
missing  and  extra  points,  and  at  the  same  time  the  locations  of  points  are  observed 


with  moderate  noise.  It  is  exactly  this  situation  that  we  want  to  deal  with  in  this 
thesis. 


Chapter  3 

Method  of  Estimation  in  the 
Presence  of  Noise 


3.1  Problem  Statement 


Suppose  that  we  have  two  consecutive  noisy  image  frames,  taken  at  times  t\  and  t2, 
(tx  <  tj),  of  an  object  in  two  dimensional  space  and  that  the  object  was  subjected 
to  a  rigid  body  motion  in  the  time  between  the  two  image  frames.  Assume  that 
all  points  of  the  object  lie  on  a  plane  called  the  object  plane  parallel  to  the  image 
frames. 

The  case  we  consider  is  where  we  observe  some  of  the  prominent  points  but  do 
not  recognize  them  individually  and  some  of  the  observed  points  may  be  missed  in 
one  or  the  other  of  the  two  image  frames.  Thus  it  is  not  obvious  which  subsets  of 
the  observed  points  in  the  two  image  frames  correspond  to  one  another.  Moreover 
the  observed  locations  are  assumed  to  be  noisy  with  Gaussian  noise  >/(0,ct2I2) 
independently  added  to  each.  The  assumption  of  equal  variance  a3  for  both  image 
frames  corresponds,  in  one  interpretation,  to  an  assumption  that  both  image  frames 
are  at  the  same  distance  from  the  observer. 

Now  the  problem  is  to  estimate  the  motion  parameters,  namely  the  angle  6 
of  rotation  about  the  origin  and  the  translation  vector  T,  given  the  data  of  the 
locations  of  the  observed  points  on  the  object  in  the  two  image  frames. 

In  Figure  3.1,  we  have  an  example  of  simulated  data,  namely,  two  image  frames 
showing  the  noisy  locations  of  the  observed  points  at  two  different  times  t\  and 
tj,  when  there  has  been  a  rotation  by  90°  (=  it/ 2  radians)  followed  by  a  slight 
translation. 

3.2  Basic  Idea  and  Motivation 

3.2.1  The  Number  of  Common  Observed  Points  m0 

Let  us  assume  that  there  are  n  distinct  prominent  points  on  the  object  in  which 


Image  Frame  1 


Image  Frame  2 


Figure  3.1:  An  example  of  simulated  data. 

we  are  interested.  Let  mi,  m2  be  the  numbers  of  the  observed  points  in  image 
frames  1,  2  respectively.  Also  let  m0  be  the  number  of  common  observed  points  in 
the  two  image  frames.  For  example,  in  the  simulated  data  illustrated  in  Figure  3.1, 
we  have 

n  =  10,  mi  =  7,  m2  —  8,  m o  =  5. 

If  we  assume  that  each  of  the  n  prominent  points  on  the  object  has  independent 
probability  p  of  being  observed  in  any  image  frame,  then  the  above  mentioned  mi, 
mj,  and  m0  become  random  variables  which  we  call  Mi,  M2,  and  M0  respectively. 
The  probability  P(mi,m2)  of  getting  mi  observed  points  in  image  frame  1,  and  m2 
in  image  frame  2  is  given  by 

P(mi,m2)  :=  P{Mi  =  mi,  M2  =  m2} 

=  P{Mi  =  ml)P{M2  =  m2} 

=  (m,)”"''1  (  m,  ) '’"’I1  - 

j  (3>1) 

which  is  just  the  product  of  two  binomial  densities,  each  with  parameters  (n,p). 
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The  conditional  probability  of  getting  exactly  m0  common  observed  points  given 
that  we  have  mi  observed  points  in  image  frame  1,  and  m2  in  image  frame  2  is  given 


by 


P{Mq  =  mo  |  Mi  =  mi.  Mi  =  m3}  = 


(  n  \  /  « -  mo  W  n-  mx  \ 
_  \  m0  J  mt  -  mp  )  \  m2  -  m0  ) 


(DTD 

mi!mi!(n  —  m3)!(fi  —  m2)! 


_  _ -  ■  _  (3.2) 

n!mo!(mi  —  mo)!(m2  —  mo)!(n  —  mi  —  m2  +  mo)!  v  ’  J 

which  is  the  hypergeometrie  density  with  parameters  (n,  mi,  m2).  A  hypergeometric 
random  variable  X  with  parameters  ( N,r,k )  has  the  following  interpretation: 


Consider  a  population  of  N  items,  of  which  k  are  of  type  I  and  N  —  k 
are  of  type  II.  Assume  that  a  random  sample  of  size  r  is  drawn  without 
replacement  from  the  population.  Then  X  can  be  defined  as  the  number 
of  items  of  type  /  in  the  sample  drawn. 

From  this  interpretation,  it  is  easy  to  see  that 

E[*l  =  ±  (3.3) 

and 

=  £0  -  y)d  -  (3.4) 

Therefore  in  our  case,  we  get  the  conditional  mean  and  standard  deviation  of 

M0  given  {Mx  =  mi, Mi  —  m2}  as 

n(m0  |  mx, mi)  :=  E [M0  |  Mi  =  mx,Mi  =  m2]  =  mirrti  (3.5) 

n 

and 


a(m0  |  m1,m2)  :=  ^/Var[M0  |  Mx  =  mlf  M2  =  m2] 

1  jmimi(n  —  mi)(n  —  m2) 

nV  n  -  1  ' 


(3.6) 


For  example,  if  n  =  10,  p  =  .8,  mi  =  7,  m2  =  8,  then  we  have 

P(7, 8)  «  .0608 
p(m0  |  7,8)  =  5.6 
o(m0  |  7, 8)  «  .6110. 

For  the  tabulation  of  P(mi,mj),  fi(mo  |  mi,m2),  a(mo  |  for  mi,  m2  >  3 

when  n  =  10,  p  =  .8,  see  Table  3.1.  For  the  cases  where  m2,  m2  <  3,  we  have 
P(mi,m2)  <  10-4. 

In  the  simulation  study  in  Chapter  5,  we  will  evaluate  the  characteristics  of  the 
estimation  procedure  to  be  proposed  for  two  examples,  in  each  of  which  m0,  m1  and 
m2  attain  certain  values.  Thus  our  evaluations  may  be  considered  to  be,  in  part, 
conditional  on  those  values  of  m0,  m2  and  m2. 

3.2.2  Heuristics  of  the  Method  of  Estimation 

In  order  to  understand  the  main  idea  which  is  employed  in  the  estimation  method 
to  be  proposed  in  the  next  section,  let  us  first  consider  the  situation  where  no  noise 
is  present  in  the  locations  of  the  points  observed  in  the  two  image  frames. 

Because  of  the  rigidity  of  the  object  involved,  the  difference  vector  of  any  two 
locations  of  the  m0  common  observed  points  should  not  change  its  length  from  one 
image  frame  to  the  other.  In  addition,  the  change  in  angle  should  be  6  (mod  2n) 
where  6  G  (— 7r,7r]  is  the  amount  of  the  angle  of  rotation  involved.  Therefore,  if  we 
were  to  look  at  all  the  possible  pairs  of  difference  vectors  (of  the  locations),  one 
from  each  image  frame  with  the  lengths  of  the  two  being  the  same,  then  a  clear 
majority  of  them  would  show  6  as  the  exact  difference  in  angles  of  the  two  in  each 
pair. 

However,  this  does  not  give  a  complete  answer  to  this  rather  easy-looking  prob¬ 
lem.  For  example,  if,  in  considering  the  prominent  points,  there  are  two  difference 


vectors  of  the  same  length,  in  either  one  of  the  two  image  frames,  then  they  may 
contribute  to  incorrectly  estimating  the  value  of  0  when  there  are  missing  points  in 
the  image  frames.  Even  if  these  difference  vectors  were  parallel  and  contributed  to 
estimating  the  value  of  $  correctly,  they  could  still  give  confusing  and  misleading 
information  about  the  translation  vector  T.  In  order  to  alleviate  this  problem  to  a 
certain  extent,  we  need  to  find  additional  evidence  that  a  pair  of  difference  vectors, 
one  in  each  image  frame,  is  a  truly  corresponding  pair  with  respect  to  the  value  of 
8  which  it  seems  to  support. 

We  propose  that  being  one  of  the  three  corresponding  pairs  of  edges  of  two 
triangles,  each  triangle  formed  by  three  observed  points,  say  At,  B< ,  C,  in  each  image 
frame  *,  i  =  1,2,  so  that  all  three  pairs  of  difference  vectors  [D[Ai,  Bi),  D{A2,  B2)) , 
(. D{B\,C\),D{B2,C2 )),  (D(Ci,  Ai),D(C2,A2))  have  contributed  in  supporting  the 
same  value  of  8,  constitutes  that  evidence.  We  call  such  a  pair  of  triangles  an 
admissible  match  of  triangles  for  that  particular  value  of  8.  So  by  maximizing  the 
number  of  admissible  matches  of  triangles  over  the  possible  values  of  0,  we  are  likely 
to  get  a  quite  reliable  estimate  of  8. 

Once  we  have  an  estimate  of  0,  it  is  easy  to  get  an  estimate  of  T.  After  rotating 
the  points  in  the  first  image  frame  by  an  angle  of  0  about  the  origin,  the  differences 
in  location  of  matching  points  will  provide  an  estimate  of  T.  What  constitutes 
matching  points  can  be  determined  from  the  vertices  of  triangles  that  have  been 
admissibly  matched. 

It  should  be  noted  that  it  is  still  possible  that  a  triangle  may  form  admissible 
matches  with  several  triangles  even  for  a  common  value  of  0. 

Now  let  us  return  to  the  problem  where  we  have  Gaussian  noise  >/(0,o2l2)  in¬ 
dependently  added  to  each  location  of  the  points  observed  in  the  two  image  frames. 
Here  we  may  expect  neither  the  lengths  of  the  two  truly  corresponding  difference 


vectors,  one  from  each  image  frame  to  be  exactly  the  same,  nor  the  angles  of  them 
to  differ  by  exactly  5 (mod  2 jt)  as  in  the  noise-free  case  discussed  above.  Neverthe¬ 
less,  we  would  expect  them  to  hold  to  a  reasonable  extent  if  we  assume  a  noise  of 
moderate  size.  The  method  we  propose,  which  is  presented  in  detail  in  the  next 
section,  is  based  primarily  on  this  consideration. 

In  order  to  estimate  0  and  T,  we  first  consider  a  fixed  candidate  angle  of  rotation, 
0O,  and  try  to  find  approximate  matches  of  the  difference  vectors,  one  from  each 
image  frame,  by  two  criteria,  one  for  the  lengths  and  the  other  for  the  angles,  that 
are  similar  to  the  ones  used  in  the  noise-free  case  above  except  that  these  are  more 
flexible  depending  on  the  size  of  the  noise  involved.  Any  candidate  pair  of  difference 
vectors,  one  from  each  image  frame,  is  removed  from  further  consideration  unless  it 
satifies  the  criteria  to  a  prespecified  extent  that  is  set  in  advance  according  to  the 
knowledge  of  the  size  of  the  noise. 

To  each  of  the  surviving  matches  of  difference  vectors,  we  then  assign  a  non-zero 
weight  according  to  the  angle  compatibility  with  respect  to  0O.  We  now  go  further 
by  considering  all  matches  of  two  triangles,  each  one  formed  by  three  observed 
points  in  each  image  frame,  that  satisfy  the  following  admissibility  condition  with 
respect  to  0O: 

A  match  of  two  triangles,  each  one  formed  by  three  observed  points, 
say  Ai,  Bi ,  C,-  in  each  image  frame  t,  i  =  1, 2,  is  admissible  if  all  the  three 
pairs  of  difference  vectors  ( D{AU  Bi),  D(A2 ,  B2)) ,  (D{B\,  Ci),  D(B2,  C2)), 
(D(Ci,  Ai),D(C2,  A2))  are  surviving  matches  of  difference  vectors. 

Then  we  assign  to  each  admissible  match  of  triangles,  a  weight  which  is  the  geomet¬ 
ric  mean  of  the  three  weights  associated  with  the  constituent  surviving  matches  of 
difference  vectors.  Now  by  summing  all  such  weights  over  all  the  admissible  matches 
of  triangles,  we  get  a  total  weight  for  the  fixed  candidate  angle  0O. 


Again  it  should  be  noted  that  a  triangle  may  form  admissible  matches  with 
several  triangles  even  for  a  common  value  of  &q. 

Now  we  first  maximize  the  number  of  admissible  matches  of  triangles  over  the 
possible  values  of  0o ,  and  then  over  the  set  of  all  such  maximizing  values  of  B0,  we 
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select  the  value  6q  of  Bq  that  maximizes  the  total  weight.  Our  final  estimate  0  of 
$  will  be  a  weighted  average  of  the  differences  in  two  angles  of  surviving  matches 
of  the  difference  vectors  forming  the  admissible  matches  of  triangles  for  80,  weights 
being  the  ones  with  respect  to  Bq.  The  final  step  is  to  get  an  estimate  T  of  T  in 
exactly  the  same  way  as  we  did  in  the  noise-free  case  before.  Note  that  if  there  is 
no  admissible  match  of  triangles,  we  simply  return  the  answer  “Two  image  frames 
do  not  seem  to  show  the  same  object”. 

It  should  be  noted  that  if  the  prominent  points  on  the  object  form  the  vertices 
of  a  regular  polygon,  we  may  have  lack  of  identifiability,  i.e.,  there  may  be  several 
possible  estimates  consistent  with  the  data. 

3.3  Method  of  Estimation 

In  the  last  section,  we  claimed  heuristically  that  the  differences  in  lengths  and 
angles  of  each  pair  of  difference  vectors  may  play  a  major  role  in  establishing  the 
admissible  matches  of  observed  points  in  two  noisy  image  frames  taken  at  times 
t\,  and  t2  respectively.  Now  to  support  that  claim,  we  will  discuss  in  detail  the 
approximate  behavior  of  the  differences  in  lengths  and  angles. 

3.3.1  Assumptions 

Suppose  that  the  physical  points  Pi,...,Pn  are  the  n  distinct  prominent  points 
on  the  object  in  which  we  are  interested.  For  simplicity,  but  without  any  loss  of 
generality,  let  us  assume  that  the  first  mo  of  these  points  are  observed  in  both  image 


1  I.'  i 


frames,  the  next  mi-mo  points  only  in  image  frame  1,  and  the  next  mj  —  m0  points 


only  in  image  frame  2.  Since  our  data  consist  of  the  noisy  locations  of  the  points 


observed  in  the  two  image  frames,  again  without  loss  of  generality,  we  will  assume 


that  there  are  only  n  (<  n)  prominent  points  to  begin  with,  where 


n  :=  mo  +  (mi  —  mo)  4-  (m 2  —  mo)  =  mi  +  m2  -  mo. 


After  making  these  assumptions,  let 


Xo  :=  ((xi, yi),  •  •  • ,  (x*,  y»)) 


be  the  true  location  of  the  n  observed  points  Pi, . . .  ,Pr  in  the  image  frame  coordi¬ 


nates  at  time  ti  in  that  order.  Also  let 


Xi  :=  ((Xii,yn),...,(Ximi,yim,)) 


be  the  observed  locations  of  the  mi  points  Pi, ... ,  Pm,  in  image  frame  1,  and 


X,  :=  ((x2i,y2i),...,(x2ma,y2m,)) 


(3.10) 


be  the  observed  locations  of  the  m2  points  Pi, . . .  ,Pmo>  Pm,+i,  •  •  •  ,Pa  in  image 


frame  2  in  their  respective  orders.  However,  remember  that  what  we  have  as  actual 


data  are  two  unordered  sets  of  observed  locations,  one  from  each  image  frame. 


Our  independent  Gaussian  noise  assumption  can  be  expressed  in  the  following 


matrix  notation: 


All  components  of  Xx  and  X2  are  independently  distributed  with 


(I  £  D  - 


i  =  1 . m„ 


(3.11) 


J-l,...,m„  (3.12) 


I  ^  f 


VVVi.vv-iyv y.vy.y.  .'.y.v-Vv 


an< 


Ihj  ■=  \ 


uw  i  *; 


U(9) 


+  T  ifj  =  l,...,m0 

+  T  if j  =  mo  +  l,...,m2 


xi+mi-mo 
[_  Vj+mi  —tHO 

The  notational  correspondence  is  described  again  in  Table  3.2. 


(3.14) 


3.3.2  The  Length  Difference  and  Angle  Difference 

Let  us  consider  any  two  prominent  points  P*,  Pj  with  1  <  k  <  l  <  mo,  so  that  they 
are  among  the  common  observed  points.  Let  us  fix  k  and  l  for  the  time  being. 

We  first  look  at  their  true  locations  fiik,  fiit  in  each  image  frame  *,  i  =  1,2.  We 
have 


and 


Pu-Mi* 
X|  -  xk 
yi-yk 


(3.15) 


Clearly  we  know 


D{lhk,lhi) 


fhi-lhk 


U(#) 


Xj  -  xt 

yi-yk  n 


\\D{lhk,lhi)\\  =  II^(Mi*.Mii)H. 


(3.16) 


(3.17) 


namely,  the  true  difference  vectors  corresponding  to  the  same  prominent  points  Pk, 
Pi  in  the  two  image  frames  have  the  same  lengths.  Moreover  if  we  consider  their 
angles , 

<Pi  :=  arctan(L>(Mik>M,i)),  *  =  1*2,  (3.18) 
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Table  3.2:  The  notational  correspondence.  Image  frames  1  and  2  taken  at  times 
ti  and  tj,  respectively.  “NA”  represents  the  nonavailability  of  the  information. 
n  =  n*i  +  m2  —  thq. 


i 
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then  we  have 


or  equivalently, 


<Pi  =  c{<Pi  +  0), 


0  =  e(<pa  -  <pi) 


(3.19) 


if  we  add  the  restriction  that  0  €  (— In  other  words,  the  angle  0  of  rotation 
can  be  recovered  from  the  difference  <p2  —  v?i  of  the  two  angles,  one  from  each  image 
frame,  of  the  true  difference  vectors  corresponding  to  the  same  prominent  points 

Pk ,  Pi- 

Now  we  look  at  the  observed  locations  (X,*,  Yik),  (Xu,  Yu)  of  Pk,  Pi  in  each  image 
frame  i,  and  denote  them  by  P,*,  P,j  respectively,  for  »  =  1,2.  Then  it  is  easy  to  see 


£{D(Pik,Pu)]  =  X(D(nik,iiu),2o2h),  for  i  =  1,2, 


(3.20) 


and  hence  we  get 


-  =  x\  (- (y-~* )  ■  for  .  =  1,2,  (3.21) 


where  x 2  (^2)  is  the  non-central  Chi-square  distribution  with  q  degrees  of  freedom 
and  non- centrality  parameter  6 2.  Since  a  \\  (6a)  random  variable  has  mean  q  +  67 
and  variance  2 q  +  4 S2  (see  [Johnson  70,  pp.  130-135]  for  derivation),  we  get  for 


*  =  1,2, 


„IHD(Pik,Pu)\l2]  n  ,  ||P(A*,fc,*,)l|2 

E  [ - 2^ - J  -  2+ - 2? - ’ 

2o2  o2 


(3.22) 


(3.23) 


It  follows  that  for  i  =  1, 2, 


E  [||D(P,.,P„)f]  =  4<r’  +  ||D(m,.,«„)H:, 


(3.24) 
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and 


iwwmiwwiMuwiwwMmiuiwiwjwwM 


Var  [||X>(/»i»,/»s,)|lI]  =  16<t*  +  Sff3  ||Z5(^*,Mii)l|1-  (3.25) 


In  the  next  subsection,  we  will  derive  approximation  formulae  for  E  [||Z?(Pa,  Pu)  ||], 
Var[||D(P,*,Pljj)||],  E[arctan(D(Pi,Pi))],  and  Var  [arctan  {D{Pik,  P#))],  for  suffi¬ 
ciently  small  o  >  0.  To  do  that,  let  us  first  simplify  the  notation  by  taking 


A 

Dot 
di 
d<x 

for  i  =  1, 2.  Then  (3.20)-(3.25)  become 


C(A] 

c 

’  •% 

2  a2 

E 

<*< 

2aJ 

Var 

A 

2a\ 

EW 

Var  [d?] 

for  *  =  1, 2. 

3.3.3  Approximations 


D(PiktPi,) 

(3.26) 

D(nik,Pu) 

(3.27) 

II A  || 

(3.28) 

II  Ax  || 

(3.29) 

*(Ax,2o2I2), 

* 

(3.30) 

*5  (£)  • 

(3.31) 

2+^- 

(3.32) 

,  2 

4  +  —r, 
a * 

(3.33) 

4a2  +  d«, 

(3.34) 

16o4  +  8a  2d$i, 

(3.35) 

We  begin  this  subsection  with  a  statement  without  proof  of  the  following  theorem 
which  justifies  a  method  called  the  “8-mttkod”  for  finding  the  approximate  mean 
and  covariance  of  a  function  of  a  random  variable.  For  a  slight  variation  and  a 
discussion,  see  [Bishop  75,  pp.  492-494]. 
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;  z  *  - u,  i  i  •  *  j  *^. ****** *  laaBjaanjaaflamaM^Mij 


Theorem  3.1  (6-method)  Let  VT  be  an  m- dimensional  real  vector  valued  random 


variable,  Vq  €  Rm,  and  let 


t(v)  := 


be  a  d-dimensional  real  vector  valued  function  defined  on  some  neighborhood  N  of 
v0  in  Rm  such  that  the  Jacobian  matrix  dt(v)/dv  exists  and  is  continuous  on  N. 


Suppose  also  that 


c  Vr--v°  -»rjw] 


as  r  —*  0, 


for  some  random  variable  W. 


Then  i(VT)  admits  the  first  order  Taylor  Expansion 


t(Vr)  =  t(v  0)  + 


(Vf  -  v0)  +  op{t)  as  t  ->  0. 


(3.36) 


(3.37) 


Vr  — »  v0  in  probability  as  t  -*  0, 
t(Vr)  — ♦  t(v0)  m  probability  as  t  —*  0, 


(3.38) 

(3.39) 


r  dv  V=V0  - 


as  t  — ►  0. 


The  conclusion  of  Theorem  3.1  is  especially  useful,  when  we  have 


(3.40) 


£  [W]  =  A/(0,E), 


(3.41) 


where  it  follows  that 


'at(v) 


W  =  JV  0, 


dt(v) 


dt(v) 


We  can  interpret  this  as  saying  that  for  small  r  >  0, 


(3.42) 


V  (3.43) 


•  tT*' 


and  hence  f(VT)  behaves  like  a  Gaussian  random  variable  t'  for  which 


E[f]  »  t(v o), 


(3.44) 


•  i  ~  .1  dt( •) 


Cov[t*|  »  r 


r 


(3.45) 


First  we  apply  Theorem  3.1  with  Vf  =  df,  v0  -4i.  t(y)  =  y/v  and  r  =  a.  To 
do  so  we  must  find  the  limiting  distribution  of  (d?  —  d^)  / a.  We  have 

=  where  _  4. 

a  a  2a 1 

=  ^[(Zx  +  $)l  +  Z2l-$J]  where  £  ^  ^  ^  =  JV(0,I2) 

=  2<r  ^2SZi  +  ( Z *  +  ^j)j  =  2\/2do,Zx  +  Oj,(<7). 


Hence 


<*?  ~  < 

<r 


£[WJ»  *(0,***,). 


Therefore  from  (3.43),  we  get  for  small  a  >  0, 


£|*l  *  H  (<*».**(  jjj)*(*4))  =  X  (<*«,2<r’)  .  (3.46) 

Again  we  apply  Theorem  3.1  with  Vr  =  A,  «o  =  Aw,  <(t>)  =  arctan(v)  and 
r  =  a.  In  this  case,  it  is  obvious  that 

Z  Di  ~  Doi  =  >/(o,2I2)  -  J/(0,2I2). 

a 


Now  we  have 


3t(t»)  _  darctan(vx,v2)  darctan(vx,v2) 

dv  dvi  '  dv2 


=  -^|cos2(arctan(vx,  v2)),  ^-cos2(arctan(vi,  v2)) 

v2  t>x 

vl  +  v2  ’  V1  +  V|  ’ 
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and  hence  from  (3.43),  we  get  for  small  a  >  0, 

2f_ 

It  should  be  noted  that  except  for  the  case  where  arctan(Poi)  =  *■,  the  approxi¬ 
mating  normal  distribution  in  (3.47)  assigns  positive  probability  outside  the  interval 
(-x,x],  but  that  this  probability  approaches  zero  as  a  — ►  0,  while  the  actual  distri¬ 
bution  is  confined  to  ( — tt,  tt],  ».e.,  a  distribution  on  the  circle.  The  exceptional  case 
may  be  treated  by  translating  the  probability  in  the  approximating  distribution 
over  (ir,2ir)  to  (— ir,  0).  However  the  need  for  this  peculiar  distribution  on  the  line 
will  disappear  below,  when  we  consider  the  distribution  of  a  difference  of  angles 
which  centers  about  0. 

Since  D2  and  D2  are  independent,  and  so  are  d2  and  d2,  we  see  from  (3.46)  and 
(3.47)  that  for  small  a  >  0, 


(3.47) 


£[arctan(Dj)]  as  M  ^arctan(Do«), 


C[d2  -  di]  as  U  (d02  -  d0i,4a2)  =  U  (o,  4a2)  (3.48) 


since 


dm  —  dt 


01 


from  (3.17).  Also 


£[arctan(.D2)  —  arctan(i?i)| 


(  2<r2  2o2  N 

M  arctan(i?02)  -  arctan(Z>0i),  -tt-  +  -75- 

\  “01  d03 1 


M  (  arctan(J902)  —  arctan(D0i)» 


4ct2 

dl 


(3.49) 


01 


or 


since 


£[c(arctan(.D2)  _  arctan(Z?i)  -  5)]  as  M 


(3.50) 


c(arctan(2?02)  -  arctan(jDoi)  -  0)  =  0  (3.51) 


from  (3.18)  and  (3.19). 


«'<  «'t  iN  j*  .i|  Ji  .<»  .«*'  »» 


J I*  »*“  V  h'  J» '  J**  I*] -»|V|'  -tj* .1* 


Therefore,  we  may  conclude  from  (3.48)  and  (3.50)  that  for  small  a  >  0, 


da  —  dx 
2<7 


«  J/(0,1) 


(3-2) 


OUU 

£  -  Wg]  „  tl  (0, 1)  .  (3.33) 

2a 

We  may  go  one  step  further  by  replacing  den  in  (3.53)  by  the  estimate  (dx  +  d2)/2 
in  view  of  (3.46)  and  get 

£  [f  (arc*n!?-i  --  gggtgilzil a  H  (0,1) .  (3.54) 

2a  2 

The  approximation  results  (3.52)  and  (3.54)  will  play  the  key  roles  in  the  esti¬ 
mation  procedure  presented  in  the  next  subsection. 

3.3.4  Estimation  Procedure 

The  two  approximate  distributional  results  (3.52)  and  (3.54)  provide  us  with  valid 
statistics  with  which  we  can  measure  the  significance  of  the  correspondence  between 
any  pair  of  difference  vectors,  say  D{PU,  Pii).  D{P2k,  Pu),  of  the  observed  locations 
Piit  Pij  in  image  frame  1,  and  Pj*,  P2i  in  image  frame  2,  and  thus  enable  us  to 
assign  a  weight  to  each  correspondence.  To  do  that  let  us  first  define  functions 
h(u),  pi(r)  and  p2(r)  that  are  needed  in  our  weighting  scheme,  by 


Mr) 

Mr) 


=  max(2.5  —  uJ,0)  for  u  €  R1, 
=  y/.2  +  r2  for  r  >  0, 


(3.55) 

(3.56) 


=  — 4.119(r+  1)  exp  {-[r  +  1)}  +  1.915  for  r  >  0.  (3.57) 


The  roles  of  these  functions  are  to  be  explained  later  after  we  indicate  how  they  are 
used  in  our  weighting  scheme  (see  Figure  3.2  for  their  graphs). 

In  the  following  steps,  let  us  describe  the  estimation  procedure,  where  we  taxe 
the  value  of  a  to  be  aa  =  ka  for  k  >  0.  The  effect  of  this  misspecification  of  a  when 
k  7^1  will  be  discussed  in  the  next  two  chapters. 


</■ 


>.y.  ^ 


uwmuiuuiuinjuinun  ware?  wwwiwj  ■  wwwarefwwmwwwwTOTO 


To  make  the  procedure  feasible  on  a  computer,  we  fix  a  finite  grid  T  C  (— x,x] 
of  candidate  angles  $o  of  rotation.  Specifically,  in  the  simulations  in  Chapter  5, 

T  =  {(.01)n  |  n  =  0, ±1, . . . ,  ±314}  (3.58) 

STEP  1  i4sstyn  an  arbitrary  order  to  the  observed  locations  in  image  frame  1,  and 
label  them  by  that  order,  Pn, . . . ,  Plmi.  Similarly  label  the  observed  locations  in 
image  frame  2  by  P2 1, . . . ,  P2m, , . 

STEP  2  For  each  pair  of  observed  locations  (Pu,Pij),  1  <  *  <  j  <  mi,  in  image 
frame  1  and  each  pair  of  observed  locations  {P2k,P2i),  1  <  k  ^  l  <  m2  in  image 
frame  2,  form  the  difference  vectors  D(Pu,Pij)  and  D(P2klP2i),  and  then  compute 


Pi»y 

(3.59) 

Riu 

:=  P(P«,A.)II 

(3.60) 

<PUj 

:=  arctan(D(PUlPij)) 

(3.61) 

<P2U 

:=  arctan[D[Plk,Pu)). 

(3.62) 

STEP  3  For  each  i,  j,  1  <  *  <  j  <  mi,  and  k,  l,  1  <  k  /  /  <  m2,  consider  the  pair 
of  difference  vectors  [. D(PU ,  Ay),  D{ P2k,  P2J)].  Call  [ ij ,  kl]  :=  [D{PU,  Pxj),D{P2k,  P2,)] 
to  be  a  candidate  match  of  difference  vectors  if 

h  (~2 ^~)  >  °-  (3-63) 
Then  denote  by  C,  the  set  of  all  candidate  matches  of  difference  vectors,  [tj,  kl\. 

STEP  4  Denote  the  ordered  triple  ([t'i*j,  Jijj],  [*i*'s,  jijs],  [*at's,  j2js])  £  Cs  by  [t*,  j)  := 
[*i*2*s, iiJjJs],  and  call  it  a  candidate  match  of  triangles.  Let 

C*  :=  {[r,  J]  e  Cs}  (3.64) 


be  the  set  of  all  candidate  matches  of  triangles. 


STEP  5  Fix  a  candidate  angle  of  rotation,  80eT. 

STEP  0  Denote  the  set  {(1,2),  (1,3),  (2, 3)}  by  (123).  For  each  [r,j]  €  C *  and  each 
(s,t)  €  (123),  compute  the  weight  of\itiuj,jt\  for  O0  by 

u;[».*«.i.i*](^o)  •—  Pi  2 

(3.65) 

If  Wi.i«j.;i](^o)  >  0,  then  call  [*',*«,  j,jt]  a  surviving  match  of  difference  vectors  for 
0O,  and  let 

S(0O)  :=  {{i.it,  jjt]  €  C  |  [t,j]  E  C\  (s,t)  €  (123),  tqw^jpo)  >  0}  (3.66) 

be  the  set  of  all  surviving  matches  of  difference  vectors  for  0O. 

We  digress  here  to  elaborate  on  the  roles  of  the  functions  h,  pu  and  p2.  As  can 
be  seen  from  Figure  3.2  (a),  h{u)  is  a  non-negative  even  function  which  serves  the 
following  two  purposes. 

First,  h  serves  as  an  “aperture”  through  which  the  argument  u  has  to  pass  in 
order  to  att?in  a  non-zero  h  value  at  all.  Specifically,  it  sets  \/2.5  as  1.5811  as  the 
upper  limit  by  which  |u|  is  bounded  to  attain  a  non-zero  h  value.  In  particular,  if 
we  have  a  real  random  variable  Z,  with  E[Z]  =  0  and  Var[Z]  =  1,  in  place  of  u, 
then  we  can  interpret  this  as  saying  that  only  the  observations  Z  within  ±1.5811 
standard  deviations  from  the  expected  value  will  get  the  non-zero  h  values.  Note 
that  the  probability  of  such  an  event, 

P{h[Z)  >  0}  «  P{|Zj  <  1.5811} 

is  approximately  .886  if  £[Z]  «  A/  (0, 1).  Remember  that,  from  (3.52)  and  (3.54),  if 
[tj,  fc/]  is  a  true  match  and  oa  =  a,  then  we  have 


and 


c  {<Pni  —  <Puj  ~  do)  Ruj  +  Rtu 


«  >/(0, 1). 


2cr0  2 

Secondly,  A,  by  itself  can  be  used  as  a  weight  of  a  special  kind  assigned  to  the 
argument  value  u.  The  resulting  weights  will  have  the  property  that  as  |u|  decreases 
from  1.5811,  the  weight  increases  sharply  and  reaches  a  smooth  maximum,  namely 
2.5,  at  u  =  0.  This  property  insures  that,  once  a  value  of  u  passes  through  the 
non-zero  aperture,  it  will  get  a  weight  which  is  not  too  close  to  zero,  and  at  the 
same  time,  the  values  of  u  which  are  close  to  0  will  get  almost  the  same  weights  as 
u  =  0  does. 

The  functions  p\  and  pj  are  used  to  adjust  the  basic  weighting  scheme  provided 
by  h  in  order  to  incorporate  the  following  idea.  Suppose  that  we  have  defined,  in 
STEP  6,  the  weight  of  [*,*«,  j,jt\  for  6Q  by 


u/. 


e  {'Pli.n  ~  Vu.i,  ~  *o)  Rua  +  Rii 


i'it  ^ 


(3.67) 


2  oa  2 

This  weight  fails  to  take  into  account  that  matching  long  difference  vectors  provide 
more  accurate  information  about  9  than  do  short  ones.  So  by  defining  the  weight 
by  (3.65)  instead  in  STEP  6,  we  enlarge  the  non-zero  aperture  for  the  pairs  of  large 
difference  vectors  using  p2,  and  at  the  same  time,  we  get  a  weight  roughly  of  the 
order  of  the  average  length  of  the  two  difference  vectors  using  pt.  The  reason  that 
we  use  y/.2  +  r 2  for  Pi(r)  instead  of  r  is  to  bound  the  function  away  from  0  in  the 
neighborhood  of  r  =  0.  For  P2(r),  functions  of  the  form 


-a(r  +  1)  exp  {-(r  +  l)}  +  b 


were  first  considered  in  order  to  get  a  point  of  inflection  at  r  =  1,  and  then  the 
coefficients  a,  6  were  determined  to  satisfy  the  conditions 


p2(0)  =  .4  and  p2(l)  =  .8. 
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The  choice  of  these  functions  is  somewhat  arbitrary. 

We  now  describe  the  next  estimation  step. 

STEP  7  For  each  [i*,  j]  6  C* ,  if  [r,J]  €  $(0O)S,  then  call  it  an  admissible  match  of 
triangles  for  0O,  and  let 

Wo)  :=  {(•-,  j)  €  C  \  [r.il  e  S(«o)s}  =  c-  n  Wo)J  (3.68) 

be  the  set  of  all  admissible  matches  of  triangles  for  8q. 

STEP  8  For  each  [i*,  g\  €  A (6 a),  define  its  weight  for  0O  by 

«'[r.jl(0o)  :=  (  II  1/3  (3-69) 

(*,«)e<iJ3) 

and  let 

w(°o)  :=  H  wl»".Jl(^o)-  (3-70) 

STEP  9  Let 

N(0 0)  :=  M(*o)|  (3-71) 

be  the  number  of  the  admissible  matches  of  triangles  for  80.  Define 

©rn«  :=  {00  e  r  I  N($ o)  =  maxiV(0o)  >  0}.  (3.72) 

*o€r 

U  ©max  =  0,  then  return  the  answer  “ Two  image  frames  do  not  seem  to  show  the 
same  object .* 

STEP  10  Let  6o  6  ©max  the  smallest  value  in  T  of  0q  such  that 

w(0o)  =  max  u/(0o)- 

#oee«„ 

Remark:  Usually,  there  will  be  only  one  0O  satisfying  (3.73). 


(3.73) 


,f  1,4  .f 


STEP  11  For  each  [t,  j]  6  A(0o)  and  each  (s,t)  €  (123),  define 

V4r.JlM  :=  c(<P3m,  -  <Pu.i ,  -  *o). 


Then  define  the  final  estimate  9  of  the  angle  of  rotation  6  by 


0  :=  On  + 


3u/(0o) 


Note  that 


«'($>)  =  J2  wir.M*o)- 

[r.J\eA(i0) 

STEP  12  Define  the  estimate  T  of  the  translation  vector  T  by 

f  -=7tkn  £  (t  ft*  -  u<«)  t  ft*)- 

3"(9o)  Wl6/(i.)  *-■  *-» 


(3.74) 


(3.75) 


(3.76) 


Chapter  4 


Maximum  Likelihood  Estimate  in 
the  Recognized  Points  Problem 
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In  the  estimation  method  we  proposed  in  the  previous  chapter,  we  have  assumed 
that  the  observed  points  in  the  two  image  frames  are  not  individually  recognized. 
In  principle  one  might  apply  maximum  likelihood  to  the  estimation  problem,  but 
the  difficulty  in  formulating  and  calculating  the  likelihood  function  under  the  above 
assumption  is  formidable.  However,  one  ought  to  expect  to  do  better  when  all  the 
observed  points  are  recognized  so  that  the  m0  common  points  among  them  in  the 
two  image  frames  are  matched  without  error  in  identification.  Hence,  this  latter 
situation,  which  is  readily  treated  by  maximum  likelihood  as  will  be  seen  in  this 
chapter,  should  provide  us  with  lower  bounds  for  the  error  in  our  estimates  for  the 
original  problem.  We  refer  to  the  two  problems  by  the  labels,  “  Unrecognized  Points ” 
and  “ Recognized  Points "  problems,  respectively. 

In  this  chapter,  we  will  prove  the  existence  of  the  maximum  likelihood  estimate, 
and  derive  its  asymptotic  normality  and  consistency  as  a  -*  0,  for  the  Recognized 
Points  problem.  Similar  results  are  derived  even  when  a  is  incorrectly  assumed  to 
be  oa  =  ka  for  positive  k  ^  1  in  constructing  the  likelihood  function. 

We  continue  to  use  the  same  notation  as  in  Section  3.3,  and  assume  that  m0  >  2 
throughout  this  chapter. 

4.1  Simplified  Assumptions  and  the  Maximum 
Likelihood  Estimate 

For  the  Recognized  Points  problem,  where  we  recognize  all  the  observed  points,  we 
can  assume  without  loss  of  generality  that 

m0  =  mi  =  m5  =  n(>  2),  (4.1) 

since  the  m,-  —  m0  unmatched  observed  points  in  each  image  frame  *,  i  =  1,2,  do 
not  provide  us  with  any  additional  information  about  the  motion  parameters  6  and 


T.  Accordingly,  let  us  assume  that  the  three  vectors 

Xo  :=  ((*l,yi),...,(Xmo.ymo))  €  RJm°  (4.2) 

X!  :=  {{XiuYu) . {XlmotYlmo))eR3m*  (4.3) 

X,  :=  ((Xu,yii),...,(x2mo,y2mo))  eR2mo  (4.4) 

are  the  true  locations  at  time  t\,  the  observed  locations  in  image  frame  1  (at  time 
*i),  the  observed  locations  in  image  frame  2  (at  time  t2),  respectively,  of  the  m0 
distinct  prominent  points  Rl,...,Rno  in  that  order.  For  notational  simplicity,  let 


us  define  the  vector  of  observations  X  by 


r  xt  1 

•"  .  X2  . 


eR4mo. 


Similarly,  let  us  define  the  vectors  m,  ji  by 


lh  :=  :  G  R2m° 


#ij  :=  €  R2m° 


is  :=  I^JgR4”*.  (4.8) 

Also  define  the  true  parameter  /3  representing  the  true  locations  of  the  distinct 
prominent  points  Pi,...,  P ^  at  time  t2,  the  angle  of  rotation  0,  and  the  translation 
vector  T  by 


Then  we  can  write 


where  g(0)  is  defined  by 


0  6  R2mo+s. 

T 


th  =  M 


(4.10) 


g{fi)  :=  (UW®Un1+rH 


MW? 


U(0)  0 


o  u(0)  •.  :  #h+  :  €R2mo  (4.11) 

:  0  T 
0  •••  0  U(0) 


Hence  we  have 


M=G(«, 


(4.12) 


where  G(0)  is  defined  by 


Mi 

•"  m 


€  R4mo. 


(4.13) 


The  independent  Gaussian  noise  assumption  that  we  made  in  Subsection  3.3.1 


can  now  be  written  as 


£[X]  =  4X|0]  = 


(4.14) 


and  hence  the  probability  density  function  /(x;/3)  of  X  is  given  by 


1  1(x-G(«)t(x-G(W 

:  p^yi^exp  ft - 7 - 


(4.15) 


Therefore  the  log-likelihood  function  L(ff)  of  X  is  given  by 


Mfi)  ■■=  £(X;«:=  log/(X;U) 


=  -2m0iog(2 

2  (7‘ 


(4.16) 


for  any  0  €  R2mo+s. 

The  maximum  likelihood  estimate  of  0  is  defined  to  be  the  value  0Mh  :=  ^ML(X) 
of  0  which  maximizes  /(X;  0),  or  equivalently,  L(0).  Note  that  0ML  is  a  solution  of 


the  likelihood  equation 

am .  0 

d'0 

where  dL(0)/d0  is  the  row  gradient  vector  of  L[0).  Also  0Mt  satisfies 


(4.17) 


Pml(G(0))  =  0 


(4.18) 


since  we  have  from  (4.16),  for  any  0  ^  0, 


£(<?(«;«  -  £(G(/>);«  =  li2iSLl°ML}2ietz. GW)  >  0, 

where  the  strict  inequality  follows  from  Lemma  4.2  to  be  proved  in  the  next  section. 

A 

In  the  next  section,  we  will  derive  the  existence  of  0ml,  and  its  asymptotic 
properties  as  a  — ♦  0. 

The  proof  of  a  strengthened  result  to  the  effect  that  /3ML  is  unique  with  proba¬ 
bility  1  will  be  deferred  to  Appendix  A.l. 

4.2  Asymptotic  Normality  and  Consistency  of  the 
Maximum  Likelihood  Estimate 


In  this  section,  we  assume  that  the  true  a  is  known  unless  otherwise  stated. 

First  note  that,  from  (4.16),  we  have  by  differentiation, 

T 

2  v—  (4-19) 


dL(0)T 
— 


1  dG(0)  - 

a?  -  *sr (x - GW)- 


We  begin  with  the  following  lemma. 


Lemma  4.1  Let  X  be  an  m-dimcnsional  Gaussian  random  vector  whose  true  dis¬ 
tribution  is  given  by 

C[X}  =  }/(K(0),  E)  (4.20) 


for  some  0  €  Rd,  some  m  by  m  positive  definite  matrix  £,  and  some  function 
K  :Rd  — ►  Rm.  Let 

r  km 


K(0)  := 


Km{0) 


for  any  0  €  Rrf.  Suppose  each  Ki(0 )  is  twice  differentiable  with  respect  to  0. 


Then  the  log-likelihood  function  L(X;  0)  of  X  satisfies  the  following. 


dL(X;0)  T 


dL(X;0) 


10  =  0 


\0  =  0. 


=  -E 


a’i(X;3) 


l  \fi  =  fi\ 

PLObfi) 

d?  fr.fi)  =  (K(fi),fi) 


(4.21) 


•  (4.22) 
(4.23) 


The  matrix 


J:=  E 


dL(X;0)  T 


(X;  0)  8L(X;0) 

d0  0  =  0  dP 


0  =  0. 


(4.24) 


is  Fisher’s  information  matrix. 


Remark :  The  first  relationship  (4.21)  in  the  conclusion  above,  holds  under  more 
general  regularity  conditions  on  L(X;0)  which  allow  the  change  of  the  order  of 
differentiation  and  integration.  Also,  it  is  known  that  the  second  one,  (4.22) ,  holds 
under  somewhat  more  general  circumstances.  For  details,  see  [Kendall  77b,  pp. 
56-58]  and  [Huzurbazar  49]. 

Nevertheless,  we  will  write  out  the  proof  of  Lemma  4.1  under  the  assumptions 
given,  for  the  later  use. 

Proof  :  Since  £[X]  =  }J{K[0),J:),  we  have 


£(X;«  :=  -llog((2*ni:|)  -  1(X  -  K  (fi))  S -‘(X-K(fi)) 


(4.25) 


and  hence 


dL(X;0)T  dK(0)T 


E-^X -*(/?)). 


30  80 

Differentiating  one  more  time  with  respect  to  0  gives  us 


(4.26) 


d2L(X-,0)  _  8  \dK(0)T 


d0  I  d0 


J1-'{X-K(0)) 


£mV«+^V'(X -*(«),  (4.27, 

80  80  d0 
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where  we  used  the  notation  (daK{0)/d0  )  to  denote  the  row  “vector”  with  matrix 


components, 


a’g(i)T  a’fUi) 


V 


(4.28) 


with  the  understanding  that  it  be  treated  just  like  an  ordinary  m-dimensional  row 
vector.  By  substitution,  we  get 


J  =  E 


af!r-  £-‘(x  -  *(«)(*  -  K(p)f-z-' 

d"  0  =  0  a0  3- 


0  =  0 


dK{0)  T 


\0  =  0 


E-‘E  [(X  -  X(/>))(X  -  *((}))T]  ST1  _ 


10  =  0 


=  3JmT  E-Cov[X]E-«| 

aP  0  =  0  aP 

=  **  A T  E-»  aK[P) 

dP  0  =  0  a~P  0  =  0 


0  =  0 


(4.29) 


0iL(X;0) 

a02  0  =  0. 


=  E  I- 


dK(0)  T 


=-i  a*  A 


r 


1/1  =  0  d&  \0  =  0  dP  10  =  0 


E -l(X-K(/3)) 


a^A 

00 


,-i  3*  A 


52*(0) 


_  ajrAf 


0  =  0  |0  =  0  50  |0  =  0 


E_1E  [X  -  jFC (0)] 


aP  \0  =  0  aP  \0  =  0 


(4.30) 


32L(X;0) 

a~P  (X,0)  =  (K(0),0) 


dK{0) 


,-« a/r  A 


a2if(0)r 


aP  0  =  0  a~P  0  =  0  00  0  =  0 


E  ~1(K(0)-K(0)) 


dK{0) 


dfi 


,-i  dK(fi) 


/?  =  /? 


d/3 


0=0 


(4.31) 


This  completes  the  proof. 

Applying  Equation  (4.29)  in  the  proof  of  Lemma  4.1  to  our  case,  namely,  when 
the  log-likelihood  function  is  given  by  (4.16),  we  get 


J  = 


1  dG{0) 


d0 


dG{0) 

0  =  0  d& 


0  =  0 


(4.32) 


We  will  now  derive  the  existence  of  the  maximum  likelihood  estimate  0ML  and 
its  consistency  as  a  — *  0  for  the  Recognized  Points  problem.  To  do  that,  first  let 


n  :=  R2mo  x  (— 7T,  tt]  x  R2  C  Rlmo+s. 


(4.33) 


Then  the  space  O,  with  the  topology  on  (— ir,  ir]  being  the  identification  topology 
with  respect  to  the  covering  map  c*  :  S1  — ►  (-7r,7r],  (i.e.,  a  set  V  C  (— tt,  7t]  is 
open  if  and  only  if  c*~1(V)  is  open  in  Sl),  is  the  space  of  parameters  0  over  which 
we  want  to  maximize  the  likelihood  function  f(X;0),  or  the  log-likelihood  function 

L(0). 

We  need  the  following  lemma  to  proceed.  This  lemma  states  that  G{0)  is  one- 
to-one  on  that  subset  of  Q  corresponding  to  at  least  two  distinct  prominent  points. 
It  essentially  follows  from  the  fact  that  the  identity  is  the  only  rotation  about  0  in 
R2,  which  has  a  non-zero  fixed  point.  (See,  for  example,  [Choquet  69,  pp.  59-66].) 


Lemma  4.2  If 


0:= 


AT  • 

Ml 

~  1 

“  ~9 

Mi 

0 

,  and  0  := 

6' 

T  \ 

T 

e  n, 


and  represents  at  least  two  distinct  prominent  points,  then  G(0)  =  G(0)  implies 


Proof  :  Suppose  G(fi)  =  G(0).  Then  from  (4.13)  we  have 

Ml  =  ft,  (4-34) 

and 

gifi)  =  $(£')•  (4.35) 

Then  from  (4.11),  (4.34)  and  (4.35),  we  get 

(U(0)  ®  1^)/*!  +  T®lmo  =  (U(fl')  ®  ImojMi  +  f  ®  1^.  (4.36) 

Without  loss  of  generality,  we  can  and  will  assume  that  and  Mj2  are  distinct 
points.  Then  looking  at  the  first  four  components  of  the  vectors  in  (4.36),  we  get 

U(*)Mn+f  =  U(£')M!i  +  r',  (4.37) 

and 

U(*)Mn  +  f  =  Uffifiu  +  f '.  (4.38) 

Subtracting  (4.38)  from  (4.37)  gives 


or 

U(«')'1u(«)(S11-(i1!)=f,i-51I, 

and  hence 

U(« -£')(Mii  “Mu)  =  Mu  -  Mu- 
Since  Mn  #  Mj2,  we  must  have 


U(0-  8')  =  I2 


and  thus 

5  =  6'. 


(4.39) 
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Then  from  (4.37)  and  (4.39),  we  get 


This  completes  the  proof. 

We  will  state  without  proof,  the  following  theorem  which  proves  the  existence  of 
the  maximum  likelihood  estimate  based  on  n  i.i.d.  observations,  and  its  consistency 
as  the  sample  size  n  -*  oo,  under  certain  conditions.  Then  after  that,  we  will  show 

A 

how  the  theorem  applies  to  the  maximum  likelihood  estimate  0ML  in  the  Recognized 
Points  problem.  For  a  proof  see  [Chemoff  79,  pp.  52-54]. 


Theorem  4.1  Suppose  that  it  is  possible  to  extend  the  space  of  parameters,  Cl,  to 
a  compact  set  Cl*  such  that 

(i)  for  every  0  €  Cl*,  P is  a  subdistribution  with  density  f{x\~0),  (i.e.,  f(x-,0)  > 
0  and  f  f[x;0)  dx  <  1),  which  is  continuous  with  respect  to  0, 

(ii)  for  every  0  £  Cl*,  there  is  a  neighborhood  N-  of  0  such  that 

P 


E0  inf  logf^i^ 
P  0EN~  L/(x; 0) 


>  —oo, 


(iii)  for  every  0  €  Cl*  \  {0},  P £  ^  P g. 

Then,  for  every  neighborhood  N g  of  0,  0n,  the  value  of  0  E  Cl*  which  maximizes 
the  likelihood  based  on  the  first  n  observations,  exists  and  satisfies 


Pg{0„  &  N g  infinitely  often}  =  0 


ftv! 


•%V7 


Pfi{P«*Nfi}-+0  «  n  — ►  oo. 


(4.41) 


First  note  that,  in  the  Gaussian  case,  Theorem  4.1,  which  deals  with  the  maxi¬ 


mum  likelihood  estimate  when  the  sample  size  n  — ►  oo,  can  also  be  applied  to  the 


situation  where  we  have  a  fixed  sample  size  but  a  — ►  0,  for  the  following  reason,  as 


mentioned  in  [Johansen  84,  p.  77].  If  X^,. .  .  .X^  are  independently  identically 


distributed  with 


jC[XW]  =  i=l,...,n, 


X:=l£x(,) 


n“i 


is  a  sufficient  statistic  for  fi  and 


£[X]  =  >/(M,a2E) 


where 


a1  - ►  0  as  n  — ►  oo. 

n 


(4.42) 


Thus  if  we  interpret  the  role  of  n  to  be  that  of  a  2,  the  Gaussian  model  makes  the 


two  maximum  likelihood  problems  equivalent. 


Now  we  will  show  that  our  model,  defined  by  the  log-likelihood  function  L{P)  in 
(4.16)  with  the  parameter  space  fl  in  (4.33),  satisfies  the  condtions  of  Theorem  4.1, 

A 

and  as  a  result,  that  0ML  exists  and  is  consistent.  To  do  this,  let  us  first  note  that 
0  is  locally  compact  and  Hausdorff  so  that  we  can  add  a  new  point,  oo,  to  it  and  get 
a  new  space  Cl*  that  is  compact  ([Royden  68,  p.  168]).  In  fl*,  a  subset  is  open  if  it 
is  either  an  open  subset  of  fl  or  the  complement  of  a  compact  subset  of  fl.  Define 


the  subdistribution  at  oo  €  fl*  by  taking 


/(x;  oo)  :=  0  for  any  x  £  R4mo. 


(4.43) 


Then  condition  (i)  of  Theorem  4.1  follows  from  the  fact  that 


.  lim  /(x;  0)  =  0  =  /(x;  oo). 

0  —*  oo 

Note  that  for  any  0  6  fl, 

,og  [(x  ~  G(/,))T(X  ~  ~ (x "  a(wT(x  ~ 

>  -2ji[(x-G(«)r(x-G(«)]- 

For  0=oo,  the  above  log  is  intrerpreted  to  be  +oo  and  the  inequality  still  holds, 
thus 

E  '  inf  log  ~  -^2E  [****«.]  =  -2m0  >  -oo,  (4.44) 

and  condition  (ii)  follows.  Finally  condition  (iii)  is  satisfied  if  0  corresponds  to  at 
least  two  distinct  prominent  points,  since  it  is  obvious  that  ^  Pp,  and  for  every 
0  €  n  \  {/?},  the  mean  G{0)  uniquely  determines  P^,  and  thus  by  Lemma  4.2,  if 
0^0,  G(0)  ^  G(0)  and  P ^  /  Pq.  Therefore  all  the  conditions  of  Theorem  4.1  are 
satisfied  and  hence  we  have  the  existence  and  consistency  of  the  maximum  likelihood 

A 

estimate  0ML,  namely, 

Theorem  4.2  If  0  corresponds  to  at  least  two  distinct  prominent  points  (m0  >  2), 
then  0Mh  exists,  and 

A 

0ML  0  in  probability  as  o  — *  0.  (4.45) 

A 

Note  that  /JML  cannot  be  oo,  since 

/(X;3)  >  0  =  /(X;  oo) 

for  any  0  €  0. 

Now  we  need  the  following  theorem  to  derive  the  asymptotic  normality  of  0ML. 
For  a  proof,  again  see  [Chernoff  79,  pp.  19-21]. 


Theorem  4.3  Suppose  that  the  follovoing  assumptions  hold. 

A 

(0  0n^fi  in  probability  as  n  — ►  oo, 


(U)E£ 


30  1 0  =  0 


(iii)  r'(X-,0)  .  =0, 

30  0=  0_ 

(iv)  J\{0)  is  positive  definite, 


(v)  Ep[W{6)\  -+0  as  6^  0, 


where 


M0)  := 


a  log  f(xMT 


(X;  0)  d  log  /  (X;  0) 

P  0  =  0  dP  0  =  0\ 

d2  log  /(X;  0)  d*  log/(X;^) 

30  30  0=0 


and  ||  •  ||iup  represents  the  largest  component  of  the  matrix. 


W{6)  :=  sup 


(4.46) 


(4.47) 


Then  we  have 


£  [Vn(0n-0)\  -*  N  (0,Jfx(^))  as  n-*  oo. 


(4.48) 


Again,  by  the  same  argument  as  in  the  paragraph  immediately  following  Theo¬ 
rem  4.1,  we  can  apply  Theorem  4.3  to  our  small  o  situation.  The  only  exception  is 
that  the  conclusion  (4.48)  should  be  translated  into 


*  A  " 

£  —  JY  (O.a-V"1)  as  cr  — ►  0, 


(4.49) 


since  J\{0)  is  Fisher’s  information  matrix  corresponding  to  the  likelihood  function 
of  a  sample  X  of  size  1,  and  for  a  sample  of  size  n,  the  corresponding  information 
matrix  Jn(0)  becomes  Jn{0)  =  nJi{0)  so  that  Jf  *(/?)  =  nJ~1(0),  which  corresponds 
to 
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Now  we  will  show  that  the  conditions  of  Theorem  4.3  are  satisfied  by  Pml  P 
corresponds  to  at  least  two  distinct  prominent  points  (mo  >  2).  First,  condition  (i) 
follows  from  Theorem  4.2.  Condition  (ii)  follows  from  (4.19)  and  the  fact  that 
E  ^[X]  =  G{P).  To  show  that  condition  (iii)  is  satisfied,  let  us  first  note  the  following 


identity: 


=  f  (X-  m  &  1°8  /  (X;  0) 
dp  1  1  dp 


(4.50) 


By  differentiating  both  sides  of  (4.50)  with  respect  to  we  get 


aa/(X;/?) 
d0  p  =  p 


=  /(X;/?) 


d  log /(X;  p)  T 


\P  =  P 


a  log /(X;  ft 

dp 


p=p 


d*  log /(X;  P) 


I  P  =  P 


(4.51) 


Thus  by  dividing  both  sides  of  (4.51)  by  /(X;/?)  and  then  by  taking  expectations 


under  P,  we  get 


Eg 


dP  p  =  p 


=  E, 


d  log  /  (X;  P) 


\p  =  p 

d1  log  /(X;  P) 

dp2  p  =  P 


3  log /(X;  a) 

d'p 


P  =  P 


However  the  first  term  on  the  right  is  J  and  the  second  is  —  J  from  (4.21)  and  (4.24) 
in  Lemma  4.1.  Therefore  condition  (iii)  is  satisfied. 

The  proof  that  condition  (iv)  is  satisfied,  i.e.,  that  J  is  positive  definite,  will  be 
given  in  Lemma  4.3  in  the  next  section. 


Now  it  only  remains  to  show  that  condition  (v)  is  satisfied.  To  do  so,  it  suffices 


to  show  that  the  third  derivatives 


Q(X-,fi-,iJ,k)  :=  =  l . 2mo  +  3  (4.52) 

opiopjdpk 


are  uniformly  bounded  in  the  region  ||0  —  /9||  <  6  by  a  function  27 (X)  whose  ex¬ 
pectation  under  0  is  finite.  Indeed,  from  Equation  (4.27)  with  K  :=  G,  £  :=  a3, 
it  is  easy  to  see  that  Q(X;  0;i,j,k)  in  (4.52)  is  a  polynomial  in  the  components 
of  X,  G(0),  dGW/dfr,  dGW/dfij,  dG(0)/d0k,  d2G(0)  130,90^  d2G(~0)/d~0jd~0k, 
d2G(0)/d0id0k,  and  diG(0)/d0id0jd0k  of  degree  two,  and  hence  uniformly  bounded 
in  the  region  1/9  -  /9jj  <6,  for  all  *,  j,  k  —  1, . . . ,  2m0  +  3,  by  a  function  J7(X)  of  the 


form 


F(X)  :=  ||a(£)jj  ||X||  +  j|b(5)|| 


for  some  a(<5)  and  b(6)  e  R4m°.  Then 

E0|tf(X)|  <  ll«(<)ll(l|G(/»)ll+4m„<TE[M))  +  ||b(5)||  where  £[,)  =  A/(0,1) 

=  W«)||  (l|G(«l|  +  ^§m0c)  +  ||b(5)||  <  oo. 

Therefore  all  the  conditions  of  Theorem  4.3  are  satisfied  and  we  have,  except  for 
Lemma  4.3,  proved  the  asymptotic  normality  of  the  maximum  likelihood  estimate 


Summarizing  all  these,  we  have  proved  the  following  theorem. 


Theorem  4.4  Let  £(X;/9)  be  the  log-likelihood  function  defined  by  (4.16)  on  Cl 

Mi 

which  is  given  by  (4.33).  Let  0  :=  $  be  the  true  value  of  0  £  Cl  where 

T 

represents  at  least  two  distinct  prominent  points  (m0  >  2). 

Then  the  maximum  likelihood  estimate  0ML(X)  which  maximizes  L(X;  0)  exists 


and  satisfies 


A 

/9ML(X)  — ►  0  in  probability  as  o  — *  0. 


(4.53) 


and 


Remark:  Note  that  from  Equation  (4.32) 

_-i  r— i \dG(P)T  dG{fi)  l"1 

O  J  —  —  m  »  , 

dP  p  =  p  dP  p  =  p, 

and  is  independent  of  a. 

It  is  easy  to  see  that  with  probability  —►  1,  the  second  derivative  of  the  log- 
likelihood  function  L(p)  is  continuous  and  negative  definite  in  a  neighborhood  of 
P  and  hence  with  probability  — ►  1,  the  likelihood  equation  (4.17)  has  a  unique 
solution  in  this  neighborhood  as  a  — ►  0.  (See,  for  example,  [Foutz  77].)  However, 
as  mentioned  earlier,  in  the  last  section,  a  stronger  result,  to  the  effect  that  the 
maximum  likelihood  estimate  is  unique  wth  probability  1  can  be  proved.  That 
proof  will  be  deferred  to  Appendix  A.l. 

Now  Theorem  4.4  allows  us  the  approxmation 

£  [£ml(X) -/>]«*((>,  J-1)  (4.55) 

for  a  sufficently  small  a  >  0,  and  this  provides  us  with  J-1  as  a  measure  of  perfor¬ 
mance  with  which  our  estimators  6  and  T  can  be  compared. 

4.3  Fisher’s  Information  Matrix  J  and  Its  Inverse — 
Asymptotic  Covariance  Matrix 

In  this  section,  explicit  formulae  for  Fisher’s  information  matrix  J  and  its  inverse 
J~l  will  be  derived.  The  results  will  be  used  to  estimate  the  approximate  standard 
deviations  of  the  maximum  likelihood  estimate  discussed  so  far,  and  then  we  will  be 
able  to  compare  them  with  the  ones  from  the  sample  distributions  of  the  proposed 
estimator  obtained  by  simulation.  That  task  will  be  done  in  the  next  chapter. 
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First,  let  us  recall  from  Equation  (4.16)  that 


-.T. 


L(X;jj)  =  -2m0log(2?ro3)  -  l(x  G(ftll'(X  .  ■?(£)! 


where 


/? 

GW 


Mi 

0 

A* 

T 


€  n  c  RSmo+s, 


Mi 


(4.56) 


(U(^)®Imo)/i1+T®lmo 

Mi 

U($)  0  •••  0 

0  U(0)  : 

i  **.  o 

0  •••  0  u  {§) 


Mi  + 


r 

A* 

r 


6  R4to°.  (4.57) 


By  differentiating  both  sides  of  Equation  (4.57),  we  have 


dGjfi) 

dfi 


Ij»no  Ojnyjxi  OjnmxJ 

[u(»)®U,  (U(f  +  *)®L»)j*,  I,®1TO 


(4.58) 


and  hence  from  Equation  (4.32)  we  have 

1 T 


J  = 


1  9GW 


dfi 


dGjfi) 
jfi  =  0  d& 


fi  =  fi 


i2m0 


U(5)T®Imo 


Olx2mo  Mf  (U(f  +  *)T  ®Imo) 


I2®1J 


l2mo 


^2moXl 


[U^®!^  (U(f +  *)®Imo)Mi  I2®1 


2mox2 

fflo 


lP2x2mo  -t  -o’  -mo 

,rii™,+(U(8),'U(8))®I™  ((U(*)rU(f+*))®U)(*i  U(«)T®lmo 

=3Mr((u(f+*)ruw)®u,)  (*r((u(f+«)Tu(f+«))®u.)M,  MT(u(f+*)r®i™.) 

I  U(«)®C  (U(f+«)®1L)M,  i»®(iS.u.) 

(4.59) 


By  simplifying  the  last  matrix  above,  we  get 


21 


Imo 

T 


(U(f)®I„>, 


rf(U(f)'«U.) 


mFMi 


U(«)®1^  (U(f  +  *)®lJ>i 


U(fl)T  ®  1„. 

#*r( U(f  +  °f  ®  1m.) 

m0l2 


(4.60) 
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and  by  expanding  the  submatrices  out,  we  get 

j  _  _L  [  A  i 


A  B 

“  o*  BT  D  * 


(4.61) 


where  A,  B,  and  D  are  2m0  by  2m0,  2m0  by  3,  and  3  by  3  matrices  respectively, 
given  by 


A  := 


2  0  •  •  •  0 

0  2  • 

:  **.  **.  0 

0  •••  0  2 


(4.62) 


B  := 


-yi 

COS  6 

sin  6 

*i 

—  sinfl 

cos  6 

y»no 

cos  6 

sin0 

Xmo 

—  sin0 

cos  6 

(4.63) 


E^i(x?+y?)  -ES>iX,sintf-E,^1y.cosO  E&*,  cos* -Eltsin*- 
D  :=  -EfciXisinfl-E^y.cosfl  m0  0 

E^x.  COS  sin  0  0  m0 

(4.64) 

This  relatively  simple  partitioned  structure  of  the  matrix  will  allow  us  to  compute 
the  inverse  without  much  difficulty.  Before  we  try  to  invert  the  information  matrix 
J,  let  us  first  prove  that  it  is  positive  definite  and  hence  invertible  (non-singular). 
We  prove  it  in  the  following  lemma. 


Lemma  4.3  If  (i  corresponds  to  at  least  two  distinct  prominent  points  (m0  >  2), 
then  Fisher’s  information  matrix  defined  by  (4.24)  is  positive  definite,  for  any 

z  e  R2mo+s, 


Proof  :  First  note  that  for  any  z  €  R,m°+S, 


r  r  _T  1  dG(fi)  dG{0) 

iJz  =  z  — — * —  - » — 

a  dP  0  =  0  d0  0  =  0 


1  dG{0)  dG(0) 

—  - w—  Z  - » - 

C  a  A  QB 


L  dfi  0  =  0  J  L  dfi  '0  =  0 


L  52(0)  i 

ai  d0  0  =  0 


>  o. 


Now  assume  that 


zTJz  =  0 


Then  from  (4.65),  we  have 


8G{0) 


d0  \0  =  0 


z  =  Q 


where 


dG{0)  _  Ilmo  02moXl  02m<,x2 

~ap~  n  _  ,  -  l  UM  ®  (u(f  +  «)»!„„)»<,  1,01™, 


\0  =  0 

from  Equation  (4.58).  Let 


T 

z  i=  zi, . . .  t  zjfriQ,  c  . 


Then  it  follows  that 


L  Z2"V)  J 

2l  1 

(U(0)  ®  Imo)  :  +(U(f  +  l)®I^)Mi«  +(I2®lmo) 

.  Z2mo  . 


Clearly  [zu . . . ,  z2m«]  =  0,  and  if  o  ^  0,  then  we  get 


(4.65) 


(4.66) 


(4.67) 


(4.68) 


and  hence 


#*!*”«  1(U(j  +  «)T®lm0) 

which  implies  that,  for  all  *  =  1, ... ,  mo, 


(4.69) 


x« 

y. 


=  -o-‘U(|  +  *)T 


(4.70) 


i.e.,  all  the  (x,-,y,)’s  are  the  the  same  and  this  is  a  contradiction.  So  a  =  0  and  it 
follows  from  Equation  (4.67)  that  [6,  c]  =  0.  Therefore  z  =  0.  This  completes  the 
proof. 

Now  we  will  use  the  easily  verified  fact  that,  since  J  is  non-singular, 


J~x  =  a1 


A’1  +  FE-1Fr 
-E~lFT 


-FE"1  1 
E"1  j  ’ 


(4.71) 


where 


E  :=  D  —  (4.72) 

F  :=  A~*B.  (4.73) 


Since  we  are  only  interested  in  the  3  by  3  portion  J~1(0,T )  of  the  asymptotic 
covariance  matrix  J~x  corresponding  to  the  parameters  0  and  T,  we  only  have  to 
compute  e^E'1  where  E  is  defined  in  (4.72). 

By  a  long  but  routine  calculation,  we  get  the  following  symmetric  matrix,  of 


which  only  the  upper  diagonal  elements  are  shown. 
J’x{0,T )  :=  o’E'1 


2  a2 


m0{al+cl) 


1  ns  sinfl+My  cosfl  —  (jx,  cos0  —  /ivsin0) 

+  rf)  +  (m x  sin  0 + Hy  cos  0) 3  —  (fix  sin  0 + nv  cos  0)  (fis  cos  0  —  nv  sin  6) 

(a*  +  al)  +  (m*  cos  0  -  nv  sin  0 )2 

(4.74) 


where 


In  particular,  when  0  =  0,  as  will  be  the  case  in  our  simulation  study,  the  matrix 
(4.74)  becomes 


r- 1/ 


hT)  = 


2  a2 


m0(al  +  a*) 


1  Hv  ~Hz 

Hv  [a\  +  o\)  +  h\  —HzfJ-y 

L  -HzHy  {al  +  +  Hi  J 


(4.79) 


We  will  refer  to  the  diagonal  entries  of  this  matrix  by  for  use  in  the  next 
chapter. 

Now  we  will  discuss  the  case  when  a  is  incorrectly  assumed  to  be  cra  =  ko  for 
positive  k  ^  1  in  constructing  the  likelihood  function.  We  will  distinguish  this  case 
from  the  one  discussed  so  far  by  attaching  a  subscript  “a”  to  the  notation  that  has 
been  in  use.  Hence  the  log-likelihood  function  of  this  case  will  be  denoted  by  La(0) 
instead  of  where 


LM  :=  — 2m0.og(2™:)  -  1  (X  -  -  Cfl)). 


Then  it  is  clear  that,  for  both  L(fi)  and  La(/3 ),  maximizing  them  is  equivalent  to 


-  t. 


minimizing  (X  —  G(0))  (X  —  G(fi)).  Hence 


“  ^ML> 


(4.80) 


and  all  the  results  derived  so  far  for  /?ML  are  also  valid  for  ^MLa.  In  particular,  ^MLa 

A 

will  have  the  same  asymptotic  distribution  as  0ML,  as  the  true  value  of  a  —*  0,  and 
J~l  will  be  the  asymptotic  covariance  of  it  regardless  of  the  value  of  k  >  0. 


Chapter  5 
Simulation  Study 


To  compare  the  performance  of  the  proposed  estimator  empirically  with  that  of  the 
maximum  likelihood  estimate  discussed  in  the  last  chapter,  simulations  of  two  sepa¬ 
rate  cases  have  been  carried  out  by  generating  random  prominent  points,  Gaussian 
noise,  and  observed  locations  of  points  in  two  image  frames. 

As  the  pseudo-random  number  generator,  we  used  the  one  developed  by  Wich- 
mann  and  Hill  [Wichmann  82],  and  as  the  Gaussian  random  number  generator,  we 
used  the  one  by  Box  and  Muller  [Box  58]. 

5.1  Method  of  Simulation 


Without  loss  of  generality,  we  assumed  that  the  true  locations  of  the  prominent 
points  are  within  the  open  unit  disc, 

{(x,y)  €  RJ  |  x2  +  yJ  <  1}, 

and  also  that  there  was  no  motion  involved,  i.e.,  0  =  0,  T  —  0,  for  in  view  of  the 
invariance  of  the  method  we  are  employing,  the  problem  of  estimating  no  motion  is 
the  same  as  the  one  of  estimating  a  non-trivial  motion. 

In  each  of  the  two  cases,  we  generated  n  =  10  independent  random  points 
(x»i  y»)i  *  =  1, . .  • ,  10,  uniformly  distributed  on  the  unit  disc,  by  taking 

(x.,y<)  :=  (\fuiicos{2nUi2),\/Uusm{2nUi2)),  (5.1) 

where  Un,  Ui2  are  independent  uniform  random  variables  on  (0,  l).  This  is  legitimate 
because  the  density  function  of  the  point  (z,,y,)  is  then,  by  change  of  variables, 

d(x,',y.) 


the  density  function  of  (Un,Ui2) 


d(UiuUi2) 


=  1 


- — cos(27rC/j2)  —2ny/U^s\n(2nUi2) 


2^7 


2 vfe 


sin(2^17,j)  2ny/Un  cos(2nUi2) 


=  —  on  the  unit  disc. 
ir 


(5.2) 
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We  fixed  these  generated  points  {(z,-,  y,)  |  i  =  1, . . . ,  10}  to  serve  as  our  prominent 
points  in  the  entire  simulation. 

Then  we  took  p  =  .8  as  the  probability  for  each  prominent  point  to  be  observed 
in  an  image  frame,  and  accordingly  selected  the  observed  points  that  were  to  be 
fixed  throughout  the  entire  simulation  yielding  (mo.mj.mj)  =  (6,8,8)  for  one  case 
which  we  call  Case  I,  and  (m0,  mu  m2)  =  (5, 7, 8)  for  the  other  which  we  call  Case  II. 

Three  values  of  the  true  scale  of  Gaussian  noise  a  were  chosen  representing  the 
ranges  of  ulow”  (<r  =  .02),  umodtratt1>  {a  =  .05),  and  “ high ”  (a  =  .10)  noise  images 
respectively.  See  Figure  5.1  for  examples  of  the  two  simulated  image  frames  for 
each  value  of  a  in  Case  I,  and  Figure  5.2  in  Case  II. 

Also  the  same  three  values  {.02,  .05,  .10}  were  chosen  for  cra  in  the  estimation 
procedure  so  that  there  were  total  of  9  choices  of  (o,oa),  namely  {.02,  .05,  .10}2. 

From  now  on  throughout  the  entire  chapter  we  will  refer  to  the  situation  where 
we  have  a  particular  choice  of  (a ,cra)  among  the  ones  given  above  by  Situation(a,a0), 
or  just  Situation  if  there  will  be  no  confusion. 

In  each  Case,  and  for  each  Situation(<7,  o„),  we  ran  1000  trials.  Each  trial  con¬ 
sisted  of  generating  the  observed  locations  in  the  two  image  frames  by  adding  the 
independent  error  vectors  according  to  the  Gaussian  noise  .V(0,o2I2)  to  the  true 

A  A 

locations,  and  producing  0  and  T  as  the  estimates  of  0  and  T  by  the  steps  de¬ 
scribed  in  Subsection  3.3.4  with  the  following  exception.  In  STEP  9,  if  ©max  =  0, 
then  we  assigned  the  values  alternately  between  0  =  4,  Tt  =  Tv  =  3,  and  6  =  —4, 

A  A 

Tt  =  Ty  =  —3  for  the  purpose  of  tabulation.  Remember  that  in  the  actual  estima¬ 
tion,  0  €  ( — tt,  7r]  always,  and  seldom  will  we  get  12^1  >  2  or  |TV|  >  2  in  practice 
because  all  the  true  locations  are  within  the  unit  disc. 

For  a  given  Case  and  trial  number,  the  observed  locations  for  different  Situations 


’-14  -1  JO  -04  0.0  04  1.0  14  -14  -1.0  -04  0.0  04  1.0  14 

(a)  Image  Frame  1  (b)  Image  Frame  2 

Low  Noise  Low  Noise 


’  -14  -1.0  -04  04  04  1.0  14  '  -14  -1.0  -04  0.0  04  1.0  14 


(c)  Image  Frame  1  (d)  Image  Frame  2 

Moderate  Noise  Moderate  Noise 


(e)  Image  Frame  1 
High  Noise 


(f)  Image  Frame  2 
High  Noise 


Figure  5.1:  Examples  of  the  two  simulated  image  frames  for  each  value  of  a,  (a)(b) 
<7  =  .02  (low  noise),  (c)(d)  a  =  .05  (moderate  noise),  (e) (f)  o  =  .10  (high  noise)  in 
Case  I,  (m0, mt, m?)  =  (6,8,8). 
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-1.0  -03  0.0  0.5  1.0  -1.0  -03  0.0  03  1.0 

(e)  Image  Frame  1  (f)  Image  Frame  2  *1 

High  Noise  High  Noise  vj 

Figure  5.2:  Examples  of  the  two  simulated  image  frames  for  each  value  of  <r,  (a)(b)  i 

a  =  .02  (low  noise),  (c)(d)  a  =  .05  (moderate  noise),  (e)(f)  a  =  .10  (high  noise)  in 

Case  II,  (m0,mi,m2)  =  (5,7,8).  > 
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were  related.  Thus,  in  image  frame  *,  *  =  1,2, 

an  observed  location  =  the  corresponding  true  location  +  ae<, 

where  e,  depends  on  the  Case  and  trial  number,  but  does  not  change  with  either  a 
or  oa. 

5.2  Results  of  the  Simulation 

A  A  ««V 

For  simplicity  we  use  7  to  represent  any  one  of  the  estimates  6,  Tz,  or  Ty,  and  also 
use  7(i)  to  denote  the  t-th  order  statistic  of  the  1000  7’s  that  were  produced  in  each 
Situation(<r,  oa). 

In  each  Case,  we  produced  the  histograms  of  the  estimates  0,  T*,  and  Tv  for  each 
Situation(<r,cra).  See  Figures  5.3  -  5.8. 

Then  we  compared  the  sample  distribution  of  7  with  the  normal  distributions  by 
costructing  the  normal  qq-plot.  With  a  normal  qq-plot  one  can  conveniently  get  the 
estimates  m,  S  of  the  mean  and  the  standard  deviation  of  the  normal  distribution 
s2)  by  which  the  sample  distribution  of  7  is  approximated,  by  fitting  a  straight 
line  to  them  and  finding  the  y-intercept  and  the  slope.  We  fitted  the  straight 
line  by  using  Krasker  and  Welsch’s  robust  regression  technique  [Krasker  79]  to  the 
points  on  the  plot  that  belong  to  the  [40, 60]-percentiles  interval,  or  equivalently  the 
(—.253,  .253]-normal  scores  interval  in  order  to  capture  the  behaviour  of  the  ones  in 
the  middle.  After  fitting  the  straight  line,  we  estimated  the  standard  deviation  of 
7(0  by  the  formula  [Chambers  83, Kendall  77a], 

(5.3) 

where  p<  =  (i  —  .5)/1000  for  *  =  1,...,1000  and  <f>,  $  are  the  probability  density 
function,  and  the  cumulative  distribution  function  of  the  standard  normal  distri¬ 
bution,  respectively.  Then  we  showed,  for  each  plotted  point,  ±4sd(7(  )  bounds 
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Figure  5.3:  Histograms  of  0  for  each  Situation  in  Case  I,  1000  trials  each.  The  two 
numbers  in  parentheses  represent  a  and  <7„  respectively. 
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Figure  5.4:  Histograms  of  for  each  Situation  in  Case  I,  1000  trials  each.  The  two 
numbers  in  parentheses  represent  a  and  a,  respectively. 
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(-10,05) 
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(-10,10) 


Figure  5.5:  Histograms  of  for  each  Situation  in  Case  I,  1000  trials  each.  The  two 
numbers  in  parentheses  represent  a  and  ca  respectively. 
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•4  -2  0  2  4  6  -4  -3-2  -1  0  1  2  3  4  -3  -2  -1  0  1  2  3 

Rotation  Rotation  Rotation 


(-10,-02)  (-10,-05)  (-10..10) 

Figure  5.6:  Histograms  of  6  for  each  Situation  in  Case  II,  1000  trials  each.  The  two 
numbers  in  parentheses  represent  a  and  oa  respectively. 
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-3  -2  -1  0  1  2  3  4  -1.0  -0.6  -02  02  0.4  -0.8  -0.4  0.0  02 

Xtranslation  Xtranslation  Xtranslation 

(-10..02)  (.10, .05)  (.10, .10) 


Figure  5.7:  Histograms  of  fx  for  each  Situation  in  Case  II,  1000  trials  each.  The 
two  numbers  in  parentheses  represent  a  and  oa  respectively. 
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Figure  5.8:  Histograms  of  fv  for  each  Situation  in  Case  II,  1000  trials  each.  The 
two  numbers  in  parentheses  represent  a  and  <ra  respectively. 


around  the  line  fitted  to  the  plot.  This  may  give  us  some  sense  of  how  well  the 
sample  distribution  of  7  is  approximated  by  the  normal  distributon 

A  a  At 

We  have  presented  the  normal  qq-plots  of  0,  T„  and  Tv  with  the  bounds  in 
Figures  5.9  -  5.11,  for  each  Situation  in  Case  I,  and  in  Figures  5.12  -  5.14,  for  each 
Situation  in  Case  II. 

As  expected,  every  Situation(.02,<ra)  gave  us  a  satisfactory  result  except  that 

A  ^ 

when  a a  =  .10,  6  and  Tx  seem  to  have  long  left  tailed  distributions  in  both  Cases. 

For  Situations(. 05, cra),  all  but  when  ca  =  .10  in  Case  II,  6  and  Tx  show  long  left 
tailed  distributions  while  Ty  consistently  shows  satisfactory  results. 

For  Situations(.10,oa)  every  thing  seems  to  fall  apart.  When  oa  —  .10  our 
approach  seems  to  confuse  nearby  points.  When  oa  =  -05  or  .02,  the  standard  for 
matching  difference  vectors  is  difficult  to  meet,  and  our  method  frequently  fails  to 
find  satisfactory  matches. 

We  have  summarized  the  information  obtained  from  the  normal  qq-plots  in 
Tables  5.1  -  5.6.  They  show  m,  3  of  7  and  corresponding  oml>  the  ratio  5/oml 
along  with  the  number  of  points  outside  the  bounds  of  ±4  standard  deviations  from 
the  fitted  line  for  each  situation. 

In  Table  5.7,  we  have  collected  all  the  ratios  S/oml  in  Case  I,  and  in  Table  5.8, 
the  same  thing  in  Case  II.  From  them  you  can  easily  see  that  the  ratios  S/oml  are 
the  smallest  when  a  —  <7a  and  the  largest  occur  when  a  =  .10  and  oa  =  .02,  namely 
when  we  assumed  too  low  a  value  for  the  standard  deviation. 
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Figure  5.13:  Normal  qq-plots  of  T,  for  each  Situation  in  Case  II,  1000  trials  each. 
The  two  numbers  in  parentheses  represent  a  and  <70  respectively. 
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■ 

Summary 

m 

Statistics 
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.05 

.10 
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.0146 

A 

a 

.0224 

.0232 

.0411 

.02 

&ML 

.0182 

.0182 

.0182 

3/&ML 

1.2308 

1.2747 

2.2582 

#  Outside 
±4  sd  Bounds 

32 

79 

320 

fH 

-.0016 

.0331 

.0211 

A 

a 

.1567 

.0612 

.0895 

.05 

&ML 

.0456 

.0456 

.0456 

3.4364 

1.3421 

1.9247 

#  Outside 
±4  sd  Bounds 

605 

312 

m 

-.0861 

.0063 

A 

S 

.7240 

.2741 

.1694 

.10 
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.0912 

.0912 

.0912 

A  /  A 

S/^ML 

7.9386 

3.0055 

1.8575 

#  Outside 
±4  s<J  Bounds 

(25)  650 

453 

295 

Table  5.1:  Summary  from  the  normal  gg-plots  of  6  in  Case  I,  1000  trials  each. 

The  numbers  in  the  parentheses  represent  the  numbers  of  trials  with  “No  Matches” 
if  there  are  any. 
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#  Outside 

±4  sd  Bounds 

23 

0 

181 

to 

-.0161 

MSM 

.0279 
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s/<7MV 
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2.4710 

1.3462 

#  Outside 

±4  sd  Bounds 

(25)  255 

523 

420 

Table  5.2:  Summary  from  the  normal  99-plots  of  Tx  in  Case  I,  1000  trials  each. 
The  numbers  in  the  parentheses  represent  the  numbers  of  trials  with  “No  Matches 
if  there  are  any. 


Table  5.3:  Summary  from  the  normal  97-plots  of  Tv  in  Case  I,  1000  trials  each. 
The  numbers  in  the  parentheses  represent  the  numbers  of  trials  with  “No  Matches” 
if  there  are  any. 
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(100)  668 
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Table  5.4:  Summary  from  the  normal  qq- plots  of  6  in  Case  II,  1000  trials  each. 
The  numbers  in  the  parentheses  represent  the  numbers  of  trials  with  “No  Matches” 
if  there  are  any. 


Table  5.5:  Summary  from  the  normal  gg-plots  of  Tx  in  Case  II,  1000  trials  each. 
The  numbers  in  the  parentheses  represent  the  numbers  of  trials  with  “No  Matches” 
if  there  are  any. 
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Table  5.6:  Summary  from  the  normal  gg-plots  of  Ty  in  Case  II,  1000  trials  each. 
The  numbers  in  the  parentheses  represent  the  numbers  of  triads  with  “No  Matches” 
if  there  are  any. 
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1.4873 
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2.0238 

1.4745 

1.3344 

Table  5.7:  The  ratios  s/d ml  in  Case  I,  1000  trials  each. 
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Table  5.8:  The  ratios  S/cml  in  Case  II,  1000  trials  each. 
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Chapter  6 
Conclusions 


The  problem  of  estimating  a  rigid  body  motion  from  two  noisy  images  of  an  object 
taken  at  two  different  times  has  been  studied. 

The  available  data  consist  of  the  unordered  locations  of  some  of  the  prominent 
points  of  the  object.  Because  it  is  assumed  here  that  these  points  are  not  individually 
recognized  and  some  of  them  may  be  missed  in  one  or  both  of  the  images,  it  is  not 
obvious  which  points  of  the  two  images  correspond  to  one  another.  Moreover  the 
observed  locations  are  subject  to  error. 

A  computational  procedure,  capitalizing  on  the  rigidity  of  the  object,  has  been 
proposed  for  estimating  the  motion  parameters  of  the  object  in  the  presence  of 
U  (0, 0*1.2)  Gaussian  noise  independently  added  to  the  positions  in  the  images  of 
the  points  observed. 

In  principle,  one  might  apply  maximum  likelihood  to  the  estimation  problem, 
but  the  difficulty  in  formulating  and  calculating  the  likelihood  function  under  the 
above  mentioned  assumptions  is  formidable.  However  one  ought  to  expect  to  do 
better  when  the  observed  points  are  recognized  and  the  common  points  among  them 
are  matched  without  error  in  identification. 

For  this  favorable  situation,  where  the  likelihood  function  can  be  easily  formu¬ 
lated,  the  asymptotic  normality  and  consistency  of  the  maximum  likelihood  estimate 
as  a  — ►  0,  have  been  proved.  Also  the  asymptotic  covariance  matrix  for  that  has 
been  derived  explicitly.  Similar  results  hold  even  when  a  is  incorrectly  assumed  to 
be  oa  —  ka  for  positive  k  ^  1. 

Simulation  results  have  shown  that  in  most  cases  where  we  either  know  the 
correct  a  (i.e.,  oa  =  a),  or  assumed  a  larger  value  for  that  ( oa  >  a),  the  proposed 
estimator  does  reasonably  well  compared  with  the  maximum  likelihood  estimate, 
considering  the  fact  that  the  latter  is  under  a  favorable  situation. 

Computationally,  there  is  still  much  to  be  desired  and  it  will  be  an  interesting 
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problem  to  produce  an  algorithm  which  reduces  the  cost  of  computing  for  this 
method. 

One  important  generalization  of  this  method  will  be  to  allow  a  noise  structure 
where  the  nearby  observed  locations  are  correlated,  which  is  often  the  case  in  real 
world  data. 
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_  A 

A.l  The  Almost  Sure  Uniqueness  of  0ml(X) 


In  the  following  discussion,  we  will  continue  to  use  the  same  notation  as  in  Chapter  4, 


and  assume  m0  >  2. 


From  the  definition  of  AfLpt)  and  Equation  (4.16),  it  is  clear  that  ^ml(X)  can 
be  defined  to  be  the  value  of  fi  €  0  which  minimizes  |x  —  G(/3)||3  \  or  equivalently 
||X-G(«||. 

First,  let  us  define  the  subset  Hq  of  0  by 


n°:=j0=  £  I  M1  =  M11®lmo,M116RJ,tf  e(-jr,jr],r€RJ|,  (A.l) 


that  is,  n0  is  the  subset  of  fl  consisting  of  /?’s  which  correspond  to  m0  identical 


points.  Let  G( fl)  be  the  range  of  the  function  G: 


G( n)  :=  {G(fi)  |/?€n>CR4m°. 


(A.2) 


Similarly  let 


G(n0)  :=  {G{fi)  \fien0}c  G(n)  c R4m°. 


(A.3) 


Then  we  can  rephrase  Lemma  4.2  as  the  following  lemma. 


Lemma  A.l  The  function  G  restricted  to  fl\fi  o  is  one-to-one  onto  G(fl)  \  G(f20). 


We  need  the  following  lemma  to  proceed. 


Lemma  A.2  G(Q)  is  a  closed  subset  o/R4m°. 


Proof  :  Let  yW  :=  G(^)  with  ^  e  fi,  *  =  1,2, ...  be  a  sequece  in  G(fl)  that 
converges  to  a  point  y(°5  6  R4m°.  For  each  *  =  1,2, . . .,  let 


:=  ,  m(<)  €  RJm°,  8®  €  (— tt.tt],  T*0  e  R2, 

j*(‘) 


mm 


so  that 


M(0 

7  ‘  =  (U(tfW)  ®  i^)MW  +  T«  ®  1^  .  ‘ 

Then  clearly  the  j»W  converges  to  some  M(0)  €  R,m°. 

Since  (— x,  x]  is  compact  and  (-x,  x]  C  [-x,  x],  by  taking  a  subsequence,  we  can 
Moling  that  the  converge  in  [— x,x].  Then  the  U(flW)  converge  to  some 
tf(°)  g  (— x,x],  and  it  follows  that  the  converge  to  some  €  R*.  Therefore 

y(0)  =  UmyW  =  G(^0)), 


where 


u(°) 

o(°) 

y(°) 


€  n. 


Hence  y(°)  €  (7(H).  So  G(H)  is  closed.  This  completes  the  proof. 

Let  A  be  the  Lebesgue  measure  on  R4m°.  Then  the  following  arguments  will 
show  that  £ML(X)  is  unique  for  A -almost  all  X  €  R4m°,  and  hence  almost  surely, 
because  the  Gaussian  probability  measure  is  absolutely  continuous  with  respect  to 


A. 


For  any  x  €  Rm,  and  any  set  F  C  Rm,  let 


Mx)  :=  -  y||  I  y  €  F>. 


(A.4) 


Note  that  if  x  €  F,  then  Mx)  =  0.  We  have  the  following  propositions. 


Proposition  A.l  For  any  non-empty  closed  set  F  C  Rm  and  any  x  €  Rm,  x  €  F 
if  and  only  if  £f(x)  =  0. 

Proof  :  We  only  need  to  prove  the  “if"  part.  Suppose  Mx)  =  0.  Then,  for  each 
n  =  1,2, . . .,  there  exists  a  y„  €  F  such  that 

l|x  -  yn||  <  Mx)  +  ^ 

n  n 

Hence  the  yn  converge  to  x,  and  since  F  is  closed,  x  €  F. 


Proposition  A.2  For  any  non-empty  closed  set  F  C  Rm,  and  any  x  €  Rm,  there 
is  ay  €  F  with  ||x  -  y||  =  SF(x). 


Proof  :  Without  loss  of  generality,  we  can  restrict  y  to  the  set 

Sx  :=  {y  €  F  |  ||x  -  y|j  <  2^(x)} , 

which  is  compact,  and  on  which  the  function  y  * — ►  ||x  —  y||  is  continuous.  Thus  it 
attains  its  minimum  on  3x  and  therefore  on  F. 

Proposition  A.3  For  any  non-empty  set  F  C  Rm,  SF  is  a  Lipschitz  function. 
Proof  :  For  any  Xi,x3  €■  Rm,  and  any  e  >  0,  we  can  find  ay  ,6f  such  that 

||x3-y«||  <  £p(xj)  +  e. 


Then 


M*0  -  6f(x 3)  <  ||xa  -  y.||  -  ||x3  -  y.||  +  e 


<  ||xi-x2||+e. 


Hence,  by  letting  e  — »  0  and  then  by  symmetry,  we  have 


IMxi)  -  Mxj)|  <  ||xx  -  X2|| . 


(A.5) 


This  completes  the  proof. 

A  function  t  :  Rm  — ►  R1  is  called  totally  ( Frichet )  differentiable  at  Xo  €  Rm 
if  the  (row)  gradient  vector  dt(x)/dx|x=Xo  exists  and  t(x)  admits  the  first  order 
Taylor  Expansion  at  x  =  x0: 


t(xo  +  c)  =  t(x0)  + 

\ 

1 


dt(x) 


dx 


s  +  o(||e| 


X=Xo 


as  c  — ►  0. 


(A.6) 


Let  Am  be  the  Lebesgue  measure  on  Rm.  We  have  the  following  theorem  due  to  H. 
Rademacher  ([Rademacher  19]). 


Theorem  A.l  (Rademacher’s  Theorem)  Any  real-valued  Lipsehitz  function  on 
an  open  set  O  in  Rm  m  totally  differentiable  at  \m-almost  all  x  €  O. 


Proof  :  See,  for  example,  [Federer  69,  pp.  216-217]. 

Federer  [Federer  59,  4.8.  Theorem]  has  proved  the  following  theorems. 

Theorem  A.2  For  any  non-empty  closed  set  F  C  Rm,  and  y  £  F,  the  set 

«( y)  “  {r  €  E“  |  «,(y  +  «)  -  ||*||}  (A.7) 

is  convex. 

Proof  :  By  translating  the  set  F  by  —  y,  we  can  assume  that  y  =  0.  Then 
*  €  Q(y)  if  and  only  if  ||b  —  z||  >  ||z||  for  all  b  6  F. 

Furthermore 

||b  —  z||*  -  ||z||*  =  bT(b  -  2z)  for  any  b,  z  €  Rm, 

and  consequently,  if  b,  z,  w  €  Rm,  0,7  >  0,  a  +  7  =  1,  then 

||b-  (az  +  7w)||l  —  ||az  +  7w||*  =  bT(b  -  2az  -  2tw) 

=  bT  [a(b  -  2z)  +  7(b  -  2w)] 

=  abT(b  -  2z)  +  7br(b  —  2w) 

=  a(||b  -  «f  -  ||*||’)  +  l(||b  -  w||J  -  |w||’). 

It  follows  that  Q(y)  is  convex. 

Theorem  A.3  For  any  non-empty  closed  set  F  C  R”\  if  x  £  Rm  \  F  and  6?  is 
totally  differentiable  at  x,  then  there  exists  a  unique  y  €  F  such  that  ||x  —  y||  = 
6f{x). 
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Proof  :  By  Propositions  A.1  and  A.2,  there  is  a  y  €  F  such  that  ||x  -  y||  = 
Sr  (x)  >  0.  Then  x  —  y,  0  €  <5(y)  by  definition  of  <J(y),  and  hence  by  Theorem  A.2 
(1  —  a)(x  —  y)  €  Q(y)  for  0  <  a  <  1.  Let 


g  :=  -a(x  -  y).  (A.8) 

Then  we  have 

$*-(x+*)  =  My  +  (i-<*)(x-y)] 

=  ||(i-a)(x-y)|| 

=  Sr(x)  —  aSr(x)  for  0  <  a  <  1. 


Thus  from  the  total  differentiability  of  Sr  at  x,  as  a  |  0,  we  must  have 


dSF[x)  x  -  y 
dx  ^f(x) 


dSr(x)  g 
dx  pj| 

Sr(x  +  g)~Sr(x)  ^  o() Ja||) 


and  hence  by  taking  the  limit  as  a  j  0,  we  get 


3Sr{x)  x  -  y 
dx  $F(x) 

Since  ||d£p(x)/dx||  <  1  from  (A.5),  it  follows  that 

dM*)  _  x-y 
dx  SF{x)' 


(A.9) 


(A.10) 


Therefore  such  y  must  be  unique. 

Applying  Theorems  A.l  and  A.3  with  O  =  R4m°  \  G(0),  F  =  G(n),  along  with 
the  fact  that  for  any  x  €  F,  x  itself  is  the  unique  y  €  F  satisfying  ||x  —  y||  =  $f(x), 
we  have  proved  the  following  theorem. 


Theorem  A.4  There  exists  a  set  N  C  R4m°  with  X(N)  =  0  such  that  for  any 
X  €  R4m*  \  N,  there  exists  a  unique  y(X)  €  G(O)  that  satisfies 


||X-y(X)||  =  *«(n)(X).  (A. 11) 

A  similar  theorem  has  been  proved  by  A.  Pizznan  ([Pizman  84]). 

Let  N  be  a  set  in  R4"*0  chosen  to  satisfy  the  properties  of  Theorem  A.4.  If 
X  €  R im°\N,  y(X)  €  G(fl)  of  Theorem  A.4  is  unique.  Suppose  that  y(X)  £  G(Q0). 

A  A 

Then  by  Lemma  A.l,  0ml(X)  is  unique.  Hence,  to  show  that  /9ml (X)  «  almost 
surely  unique,  it  suffices  to  show  that  A(Af)  =  0  where 

M  :=  {X  €  R4m°  \  N  |  y(X)  €  G(n0)>.  (A.12) 

Theorem  A.5  A  (M)  =  0. 


Proof  :  Let 


Then  y(X)  €  G(n0). 

AT 

.  Ml 

For  any  $ 


X:= 


Xt 

X2 


€  M,  X„X,  €  RSm°. 


0 

A/ 

T 


€0,  6,€  R”"”,  ?€(-*,*),  fstf, 


X  -  C(«|  =  ||X,  -  5,11’  +  ||x,  -  (U(S)  8  Lv,)5j  -  f  8  1. 

=  l|X,  -  6,11’  +  ||(U(«)T  ®  I~.)X,  -  V(Sft  8  lm.  -  5,[’,(A.13) 

since  left  multiplication  by  U(0)  preserves  the  norm,  and*  so  does  the  one  by 

U($)T®Imo. 

Now  let  us  choose  a  fi0  := 
y(X)  *  G(fi0).  Then 


Mo 
*o 

i  n  j 


€  n0,  # lo  €  R,mc,  *o  €  (-*,  sr],  T0  €  R*  with 


JX-G(6o)ll’  =  jnln  ||x-G(«r 
66  0 


=  min  RX  —  G 

ftea— 


Mi 

iTo 


=  .min  [PC.-AI’  +  PC;-?.!’]  (A.14) 

M,  6R*-*  1  J 


where 


X'x  :=  (U(*0)T  ®  IM#)XS  -  U(d0)rT0  ®  W  (A.15) 

Now  the  last  minimization  problem  in  (A.14)  has  a  unique  solution  for  £4, 

(A. 16) 


„  Xi  +  Xi 
Mi  =  — j— , 


which,  by  Lemma  A.l,  must  be  of  the  form  ®  l**  for  some  /in  G  R3  because 
y(X)  G  G(fio).  Then  it  follows  that 

Xj  +  (U(^0)T  ®  Imo)Xj  =*  c  ®  I*,.  for  some  z  G  R*, 


or 

Thus  we  have 

Therefore 

AfC  jx|x  = 


X,  =  -(U(*0)  ®  I^JXi  +  w  ®  for  some  w  €  Ra. 


X  = 


Xi 

~(U(tf0)  ®  L»o)Xi  +  w  ®  1, 


(A. 17) 


Xx 


,Xi  g  RJm° , 60  €  (-*,*■], w  G  R*| 


l-(U(«0)®U)Xl+w®lmo 
and  hence  M  is  contained  in  a  smooth  manifold  of  dimension  <  2mo  +  3  <  4m0.  It 
then  follows  that 

A(M)  =  0. 

In  summary,  we  have 


Theorem  A.6  The  maximum  likelihood  estimate  j?ML(X)  it  almost  surely  unique. 
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